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BIOTINYLATION OF PROTEINS 

BACKGROUND OF THE INVENTION 

. . . ; CROSS-REFERENCE TO RELATED APPLICATION 
. This application is a continuation-in-part 
application of U.S. Serial Number 08/099,991, filed July 30, 
1993, which is hereby incorporated by reference arid benefit is 
claimed of its filing date. 
: 1. . Field of the Invention 

The present invention relates to methods for 
producing biotinylated proteins in vitro and in recombinant 
host cells. The invention therefore relates to the field of 
no]Lecular biology, but given the diverse uses for recombinant 
proteins, thie invention also relates to the fields of 
<diemistry, pharmacology, biotechnology, and medical 
. diagnostics. 

2 . Descrip tion of the Background Art 

The; ability to synthesize DNA chemically has made 
possible the construction of peptides and proteins not 
othezyise found in nature and useful in a wide variety of 
methods that would otherwise be very difficult or impossible 
to p^form. One illustrative example of this technology 
relates to the class of molecules known as receptors. 
Receptor proteins mediate important biological functions 
through interactions with ligahds. For many years, 
reisearchers have attempted to isolate and identify ligands 
that interact with receptors in ways that can help ameliorate 
human (and other) disease. The advent of molecular biology 
has revolutionized the way these researchers study 
ffeceptor-iigahd interaction. For i-nstance, standard molecular 
biology techniques have enabled the cloning and high-level 
escpression of many receptors in reiombinant host cells. 
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The patent literature, for instance, is replete with 
publications describing the recombinant expression of receptor 
proteins. See, e.g., PCT Patent Pub. No. 91/18982 and U.S. 
Patent Nos. 5,081,228 and 4,968,607, which describe 
recombinant DNA molecules encoding the IL-I receptor; U.S. 
Patent Nos. 4,816,565; 4,578,335; and 4,845,198, which 
describe recombinant DNA and proteins relating to the iL-2 
receptor; PCT Patent Pub. No. 91/08214, which describes EGF 
receptor gene related nucleic acids; PCT Patent Pub. m^. 
91/16431 and U.S. Patent No. 4,897,264, which describe the 
interferon gamma receptor and related proteins and nucleic 
acids; European Patent Office (EPO) Pub. No. 377,489, which 
describes the C5a receptor protein; PCT Patent Pub. No. 
90/08822, which describes the EPO receptor and related nucleic 
acids; and PCT Patent Pub. No. 92/01715, which describes MHC 
receptors. 

Several of the above publications not only describe 
how to isolate a particular receptor protein (or the gene 
encoding the protein) but also describe variants of the 
receptor that may be useful in ways the natural or native 
receptor is not. For instance, PCT Patent Pub. No. 91/16431 
describes soluble versions of the gamma interferon receptor, 
while PCT Patent Pub. No. 92/01715 describes how to produce 
soluble cell-surface dimeric proteins. This latter technology 
involves expression of the receptor with a signal for lipid 
attachment; once the lipid is attached to the receptor, the 
receptor becomes anchored in the cell membrane, where the 
dimeric form of the receptor is assembled. See also U.S. 
Serial No. 947,339, filed on September 18, 1992, and 
incorporated herein by reference for all purposes, which 
describes how HPAP-containing receptors can be cleaved from 
the cell surface and how the anchoring sequences that remain 
can serve as recognition sequences for antibodies that are 
used to immobilize the receptor. 

The advances made with i^^^^ to receptor cloning 
and expression have been accompanied by advances in technology 
relating to methods for screening a receptor against compounds 
that may interact with the receptor in a desired fashion. One 



such advance relates to the generation of large numbers of 
compounds, or potential ligands, in a variety of random and 
semi-random "peptide diversity" generation systems. These 
systems include the "peptides on plasmids" system described in 
U.S. patent application Serial No. 963,321, filed October 15, 
1992, which is a continuation-in-part of U.S. patent 
application Serial No. 778,233, filed October 16, I99i; the 
"peptides on phage" system described in U.S. Patent 
Application Serial No. 718,577, filed June 20, 1991, which is 
a continuation-in-part of Serial No. 541,108, filed June 20, 
1990; Cwirla fit al., August 1990, Proc. Natl . Acad . Sci. USA 
87: 6378-6382; Barrett et al., 1992, Analvt . Biochem . 204 ; 
357-364; and PCT Patent Pub. Nos. 91/18980 and 91/19818; the 
phage-based antibody display systems described in U.S. patent 
application Serial No. 517,659, filed May ii, 1990, and PCT 
Patent Pub. No. 91/17271; the bead-based systems for 
generating and screening nucleic acid ligands described in PCT 
Pub. Nps. 91/19813, 92/05258, and 92/14843; the bead-based 
system described in U.S. Patent Application Serial No. 
946,239, filed September 16, 1992, which is a 
continuation-in-part of Serial No. 762,522, filed September 
18, 1991; and the "very large scaled immobilized polymer 
synthesis" system described in U.S. Patent No. 5,143,854; PCT 
Patent Pub. Nos. 90/15070 and 92/10092; U.S. Patent 
Application Serial No, 624,120, filed December 6, 1990; Fodor 
et al., 15 February 1991, Scigiice 251: 767-773; Dower and 
Fodor, 1991, Aim. ReE. Med. Chem . 26:271-180; and U.S. patent 
application Serial No. 805,727, filed December 6, 1991. Each 
of the above references is incorporated herein by reference 
for all purposes. 

other developments relate to how the receptor is 
used in such screening methods. One important advance relates 
to the development of reagents and methods for immobilizing 
one or more receptors in a spatially defined array, as 
described in PCT Patent Pub. No. 91/07087. In one embodiment 
of this method, a receptor is attached to avidin and then 
immobilized on a surface that bears biotin groups. The 
surface is first prepared, however, with caged biotin groups. 
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Which will not bind avidin until the caging group is removed 
by, in this embodiment, irradiation. Once the avidinylated 
receptor is bound to the biotin groups on the surface, the 
surface can be used in screening compounds against the 
receptor. 

Biotin is a coenzyme that is covalently attached to 
several enzymes involved in the transfer of activated carboxyl 
groups. As the above example illustrates, biotin labeling of 
molecules not normally biotinylated can be used to label, 
detect, purify, and/or immobilize such molecules. These 
methods also rely upon the proteins avidin and streptavidin, 
which bind very tightly and specifically to biotin and other 
biot in-binding molecules, some of which bind to biotin with 
different affinity than avidin. Typically, the biotinylated 
molecules used in such methods are prepared by an in vitro 
biotinylation process. A method for biotinylating proteins 
synthesized by recombinant DNA techniques in vivo would 
eliminate the need to biot iny late these proteins chemically 
after purification and would greatly simplify the purification 
process, due to the ability to use the biotin as an affinity 
tag (see Green, 1975, Protein Bss. £2:85-133, 

incorporated herein by reference) . 

Biotin is added to proteins in vivo through the 
formation of an amide bond between the biotin carboxyl group 
and the epsilon-amino group of specific lysine residues in a 
reaction that requires ATP. In normal 1. coli , only one 
protein is biotinylated, the biotin carboxyl carrier protein 
(BCCP) subunit of acetyl-coA carboxylase. This reaction is 
catalyzed by the biptin-protein ligase (BirA) , the product of 
the birA gene (see Cronan, 1989, Cell 58: 427-429, 
incorporated herein by reference) . . 

Others have proposed a means by which biotin 
labeling can be accomplished in vivo by the addition of a 
domain of at least 75 amino acids to recombinant proteins (see 
Cronan, 1990, J. Biol, phem . 16^: 10327-10333, incorporated 
herein by reference). See also Cress et al . , 1993, Promeaa 
Notes 42: 2-7, Addition of this 75 amino acid domain to 
several different proteins leads to the biotinylation of the 
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fusion proteins by BirA on a specific lysine of the added 
domain. Addition of smaller fragments, of the 75 residue 
domain does not lead to biotinylation, implying that a 
reasoneddly complex recognition domain is required. Changes in 
the sec[uence of biotinylated proteins as far as 3 3 residues 
from the. modified lysine abolish biotinylation (see Murtif and 
Samols, 1987, J. Biol . Chem. 262: 11813-11816) • Changes close 
to ijie lysine also affect biotinylation (see Shenoy et al. , 
i988,,ZASEB J. 2: 2505-2511, and Shenoy et al./l992, J. Biol , 
iOism* 2^ however, the addition 

of such a large protein domain may negatively affect the 
biochemical properties of a biotinylated protein. Smaller 
domains that specify biotinylation would be very beneficial, 
in that such domains would have a minimal structural effect on 
the wide variety of possible fusion partners. Also, the 75 
reisldue domain does not lead to complete biotinylation of the 
domain, and improved domains could be more efficient. The 
present invention provides such improved biotinylation 
' domains.' . ';.* ■ 

SUMMARY OF THE INVENTION 
present invention provides useful compounds, 
reaigehts, methods, and kits for biotinylating proteins. In a 
fi^^st aspect, the i)resent invention provides a method for 
biotinylating a protein, said method comprising: (a) 
constructing a recombinant DNA expression vector that encodes 
a fusion protein comprising said protein and a biotinylation 
peptide less than 50 amino acids in length; (b) transforming a 
recombinant host cell capable of synthesizing a biotinylation 
enzyme with said vector; and (c) culturing said host cell 
under conditions in which biotin is present and such that said 
fusion protein and biotinylation enzyme are expressed, 
resulting in biotinylation of said fusion protein. If the 
host cell does not naturally produce biotin, then one can add 
biotin to the media. In a preferred embodiment, the host cell 
is 1. £oli, and the biotinylation enzyme is BirA. 

Thus, in the preferred embodiment, a biotinylation 
peptide of the present invention can be added to any protein 
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expressed in E. coli with a suff i^cient time of retention in 
the cytoplasm to permit BirA to act. If high expression 
levels of biotinylated protein are desired, then one can 
readily overexpress the BirA protein at the same time (see 
5 Buoncristiani efc ai- , 1988, ^. Biol . Chem . 263 , 1013-1016, 
incorporated herein by reference) . In similar fashion, host 
cells that lack an endogenous biotin protein ligase (called a 
biotinylation enzyme) can be transformed with a vector that 
codes for expression of the birA gene to provide or enhance 
10 their ability to biotinylate recombinant proteins, where, due 
to the conservation of the recognition domains, the endogenous 
biotin-protein ligase of other non-E. coli cell types 

recognize the novel biotinylation sequences, no such 
recombinant expression of a biotinylation enzyme is required. 
One can also perform the biotinylation reaction in vitro using 
a biotinylation enzyme such as purified BirA (see 
Buoncristiani, supra) , biotin, and biotinylation sequence 
peptide-tagged proteins, which proteins may be either produced 
in recombinant host cells or by in vitro translation. One can 
also use biotin analogues, such as 2-iminobiotin, which has a 
lower affinity for avidin than biotin and so may be preferred 
for some applications, in place of biotin, in the method. 

The present invention also provides reagents useful 
in the present method, including peptides, proteins, 
oligonucleotides, and recombinant DNA expression vectors. 
Thus, the present invention provides biotinylated peptides 
less than 50 amino acids in length, typically 10 to 20 or more 
amino acids in length, and oligonucleotides comprising coding 
sequences for such peptides. In addition, the invention 
provides recombinant biotinylated proteins and expression 
vectors encoding those proteins. In a preferred embodiment 
the present biotinylation peptide is 13 amino acids long and 
is defined by l^X^X^lX^x^x^X^KX^X^X^X^o (SEQ. ID NO:l), where 
is any amino acid, is any amino acid other than large 
35 hydrophobic amino acids (such as L, V, i, w, F, y) ; X3 is F or 
L, X4 is E or D; X5 is A, G, S, or T; Xg is Q or M; X7 is I, 
M, or V; Xg is E, L, V, Y, or I; X9 is W, Y, V, F, L, or I; 
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anc3 Xj^Q is preferably R or H but may be any amino acid other 
than acidic amino acids such as D or E. 

In siuomary, this invention provides a simple and 
. efficient means to biotinylate recombinant proteins, providing 
for rapid purification, immobilization, labeling, and 
detection of those proteins. The method is useful for a 
variety of purposes and is widely commercially useful for 
research and diagnostic applications. 

\ DESCRIPTION OF THE PREFERRED EMBODIMENT 
iV Definitions 

For purposes of understanding the present invention, 
the following terms are defined. 

Amino acid residues in peptides are abbreviated as 
follows: Phenylalanine is Phe or F; Leucine is Leu or L; 
Isoleucine is lie or I; Methionine is Met or M; Valine is Val 
or V; Serine is Ser or S; Proline is Pro or P; Threonine is 
Thr or T; Alanine is Ala or A; Tyrosine is Tyr or Y; Histidine 
Is His or H; Glutamine is Gin or Q; Asparagine is Asn or N; 
Lysine is Lys or k; Aspartic Acid is Asp or D; Glutamic Acid 
lis Glu or E; Cysteine is Cys or C; Tryptophan is Trp or W; 
Argihine is Arg or R; and Glycine is Gly or G. 

The term "antibody" refers to antibodies and 
antibody fragments that retain the ability to bind the epitope 
that the intact antibody binds ^ whether the antibody or 
fragment is produced by hybridoma cell lines, by immunization 
to elicit a polyclonal antibody response, or by recombinant 
host cells that have been transformed with a recombinant DNA 
expression vectpr that encodes the antibody or antibody 
fragment. 

The term "antigen" is defined as a molecule that 
induces the formation of an antibody or is capable of binding 
specifically to the antigen-binding sites of an antibody. 

The term "effective amount" refers to an amount 
sufficient to induce a desired result. 

The term "epitope" refers to that portion of an 
antigen that interacts with an antibody. 
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The term "host cell" refers to a eukaryotic or 
procaryotic cell or group of cells that can be or has been 
transformed by a recombinant DNA vector. For purposes, of the 
present invention, procaryotic host cells are preferred. 

The term "ligand" refers to a molecule that is 
recognized by a particular receptor. The agent bound by or 
reacting with a receptor is called a "ligand," a term which is 
definitionally meaningful primarily in terms of its 
counterpart receptor. The term "ligand" does not imply any 
particular molecular size or other structural or compositional 
feature other, than that the substance in question is capable 
:of binding or otherwise interacting with the receptor, a 
riigand" may serve either as the natural ligand to which the 
receptor binds or as a functional analogue that may act as an 
agonist or antagonist. Examples of ligands that can be 
investigated with the present invention include, but are not 
restricted to, peptides and proteins such as agonists and 
antagonists for cell membrane receptors, toxins and venoms, 
epitopes such as viral epitopes, antibodies, hormones, enzyme 
substrates, and proteins. 

The term "linker" or "spacer" refers to a molecule 
or group of molecules (such as a monomer or polymer) that 
connects two molecules and often serves to place the two 
molecules in a preferred configuration, e.g., so that a ligand 
can bind to a receptor with minimal steric hindrance. 

The term "monomer" refers to any member of the set 
of molecules that can be joined together to form an oligomer 
or polymer. The set of monomers useful in the present 
invention includes, but is not restricted to, for the example 
of peptide synthesis, the set of L-amino acids, D-amino acids, 
or synthetic amino acids. As used herein, "monomer" refers to 
any member of a basis set for synthesis of an oligomer. For 
example, dimers of L-amino acids form a basis set of 400 
"monomers" for synthesis of polypeptides. Different basis 
sets of monomers may be used at successive steps in the 
synthesis of a polymer. The term "monomer" also refers to a 
chemical subunit that can be combined with a different 
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chemical subunit to form a compound larger than either subunit 
alone. 

The term "oligomer" or "polymer" refers to the 
compounds formed by the chemical or enzymatic addition of two 
5 or more monomers to one another. Such oligomers include, for 
example, both linear, cyclic, and branched polymers of nucleic 
acids and peptides, which peptides can have either alpha-, 
beta-, or omega-amino acids. 

The term "oligonucleotide" refers to a 
10 single-stranded DNA or RNA molecule or to analogs of either. 
Suitable oligonucleotides may be prepared by the 
phpsphoramidite method described by Beaucage et al. , 1981, 
Tetr . Lett . 22; 1859-1862. or by the triester method, according 
to Matteucci et al. . 1981. J, Am . Chem . Soc. 103:3185, or by 
15 other methods, such as by using commercially available, 
automated oligonucleotide synthesizers. 

The term "operably linked" refers to the placement 
/ of one nucleic acid into a functional relationship with 
/ another nucleic aicid. For instance, a promoter is "operably 
20; linked" tb\ a coding sequence if the promoter causes the 

transcription pf the coding sequence. Generally, "operably 
linked" means that the DNA sequences being linked are 
contiguous aind, where necessary to join two peptide or protein 
coding regions, in reading frame with one another. 
25 The term "peptide" refers to an oligomer in which 

^ : monomers are amino acids (usually alpha-aminb acids) 
joined together through amide bonds. Alternatively, a 
"peptide" can be referred to as a "polypeptide." Peptides are 
more than two amino acid monomers long, but more often are 
30 more than 5 to 10 amino acid monomers long and can be even 

longer than 20 amino acids, although peptides longer than 20 
amino acids are more likely to be called "polypeptides." 

The term "protein" is well known in the art and 
/usua^ a very large polypeptide, or set of 

35 associated polypeptides, that has some biological function. 
For puirposes of the present invention the terms "peptide," 
^^•polypeptide, " and "protein" are largely interchangeable as 
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libraries of all three types can be prepared using 
substantially similar methodology. 

The term "random peptide" refers. to an oligomer 
composed of two or more amino acid monomers and constructed by 
a means with which one does not entirely preselect the 
specific sequence of any particular oligomer. The term 
"random peptide library" refers not only to a set of 
recombinant DNA vectors that encodes a set of random peptides, 
but also to the set of random peptides encoded by those 
vectors, as well as the fusion proteins containing those 
random peptides. The term "protein library" has a meaning 
similar to "random peptide library," but the different library 
members differ with respect to the amino acid sequence of, or 
coding sequence for, the protein of interest, so that the 
library serves as a collection of related but different 
versions of the same protein. 

The term "receptor" refers to a molecule that has an 
affinity for a given ligand. Receptors may be 
naturally-occurring or synthetic molecules. Receptors can be 
employed in their unaltered natural or isolated state, in a 
recombinant or modified form, or as aggregates with other 
species. Examples of receptors that can be employed in the 
method of the present invention include, but are not 
restricted to, antibodies, cell membrane receptors, monoclonal 
antibodies, antisera reactive with specific antigenic 
determinants (such as on viruses, cells, or other materials) 
polynucleotides, nucleic acids, lectins, polysaccharides, 
cells, cellular membranes, viruses, and organelles. Receptors 
are sometimes referred to in the art as "anti-ligands . " As 
the term "receptor" is used herein, no difference in meaning 
IS intended. A "ligand-receptor pair" is formed when a 
receptor and ligand have combined through molecular 
recognition to form a complex. 

The terms "recombinant DNA cloning vector" and 
"recombinant DNA expression vector" refer to a DNA or RNA 
•molecule that encodes a useful function and can either be used 
to transform a host cell or be introduced into a cell-free 
translation system to produce a protein encoded by the vector 
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\. For purposes of the present invention, a cloning vector 
... typically serves primarily as an intermediate in the 

construction of an expression vector; the latter vector is 
used to transform or transfect a host cell (or is introduced 
into a cell-free transcription and translation system) so that 
the transformed host cell (or cell-free transcription and 
/ translation system) produces a protein or other product 
encoded by the vect:or. Such vectors are typically "plasmids/' 
which, for purposes of the present invention, are vectors that 
can be extrachromosomally maintained in a host cell, but can 
a^o be viBctors that integrate into the genome of a host ceil. 
Those of skill in the art may refer to "cloning vectors", as 
defined herein, as "vectors" and to "expression vectors," as 
defined herein; as "plasmids." 

The term "solid support" refers to a material having 
a rigid or semi-rigid surface • Such materials will preferably 
take the form of small beads, pellets, disks, chips, or 
wafer is, although other forms nay be used. In some 
embodiments > at least one surface of the solid support will be 
substantially flat. 

The term "surface" refers to any generally 
two-dimensional structure on a solid substrate and may have 
steps, jridges, kinks, terraces, and the like without ceasing 
to be a surface. . 

; The term "synthetic" refers to production by in 
vi±ro chemi^^ enzymatic synthesis. 

II. Methods an d Reagents of the Invention 

The random peptide generating and screening system 
known as the "peptides on plasmids" system was used to 
discover the small, efficient peptide biotinylation sequences 
of the present invention. The library was constructed to 
express peptides of the form: X-^qIVXAMKHX^^q (SEQ. ID NO: 2), : 
where X denotes a random residue, the other letters are 
single-letter abbreviations of amino acids, and the 
underlining indicates slight degeneracy in the codon for the 
specified amino acids, as described below. This sequence was 
selected based on the known sequences of several biotinylated 
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proteins (see, Samols et ai- , 1988, J. Biol , chem . 
263:6461-6464, incorporated herein by reference) as shown in 
Table 1. As denoted by the ellipses, the sequences below are 
only portions of the large sequences believed, prior to the 
present invention, to be necessary for biotinylation. 



Table 1 



TC 1.3S 


. . .GQTVLVLEAMKMETEINAPTDG. . . 


(SEQ. 


ID 


NO: 3) 


OADC 


. . .GEVLLILEAMKMETEIRAAQAG. . . 


(SEQ. 


ID 


NO: 4) 


CACC 


. . .GQCFAEIEVMKMVMTLTAGESG. . . 


(SEQ. 


ID 


NO: 5) 


EcBCCP 


. . .GNTLCIVEAMKMMNQIEADKSG. . . 


(SEQ. 


ID NO: 6) 


ypc 


. . .GQPVAVLSAMKMEMIISSPSDG. . . 


(SEQ. 


ID 


NO: 7) 


hPC 


. . .GQPLCVLSAMKMETWTSPMEG. . . 


(SEQ. 


ID 


NO: 8) 


sPC 


. . . GQPLVLSAMKMETWTSPVTE . . . 


(SEQ. 


ID 


NO: 9) 


aPC 


. . . GAPLVLSAMKMETWTAPR . . . 


(SEQ. 


ID 


NO: 10) 


hPCC 


. . .GQEICVIEAMKMQNSMTAGKTG. . . 


(SEQ. 


ID 


NO: 11) 


tbp 


. . .GQPVLVLEAMKMEHWKAPANG. . . 


(SEQ. 


ID 


NO: 12) 



The lysine residue that becomes biotinylated is 
contained within the -AMKM" (SEQ. ID NO: 13) sequence common to 
most of the proteins in Table 1, which are the 1.3S subunit of 
PffopiQnj>?^ctarium shermanH transcarboxylase (TC 1.3S); the 
y^et)s^e^Jq oxaloacetate decarboxylase (OADC); chicken 
acetyl-CoA carboxylase (cACC) ; the E. coli acetyl-CoA 
carboxylase (EcBCCP); the yeast pyruvate carboxylase (yPC) ; 
the human pyruvate carboxylase (hPC) ; the sheep pyruvate 
carboxylase (sPC) ; the rat pyruvate carboxylase (aPC) ; the 
human propionyl-CoA carboxylase (hPCC) ; and the tomato 
biotinyl peptide (tbp) . The sequences of these proteins share 
several conserved residues and/or regions having similar 
properties (e.g., branched chain amino acids or amidated 
acids) . 

Despite this teaching that a large region is 
required for biotinylation, the peptides on plasmids library 
used to discover the biotinylation peptides of the present 
invention was designed to display random peptides only 27 
amino acids long, containing only one fixed codon (for K) and 
5 conserved codons (for the underlined amino acids above). 
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Conserved codons were prepared by programming the 
oligonucleotide synthesizer to add, for each nucleotide of a 
conserved codon, 91% of the correct nucleotide and 3% each of 
the three other nucleotides. By this method, a very large 
library of random peptides was prepared and used to transform 
£. coli host cells. The peptides encoded by each clone of 
these libraries were fused to the carboxy-terminus of the 
sequence-specific DNA binding protein lac repressor (Lad) . 
. Each library particle consists of a Lacl-peptide fusion bound 
- to the lac operator ( lacO ) sites on the same plasmid that 
encoded it. Expression of these libraries in the cytoplasm 
allows the cells to provide compartmentalization , so that each 
fuision protein is bound to the appropriate plasmid. 

Because the peptides on plasmids library particles 
are cytoplasmic, the random peptide region has access to the 
BirA enzyme in £. soli host cells. Any random peptides that 
productively interact with BirA (presumably a small fraction 
of the total) become biotinylated. After cell lysis, the 
bi6t4.nylated library particles were isolated by binding to 
inuoobilized streptavidin. The background of peptide sequences 
that bind to streptavidin in the absence of biotinylation (see 
Devlin sfeal., 1990, Science 2£i: 404-406, incorporated herein 
by reference) were eliminated by adding free biotin competitor 
after allowing the library particles to bind to the 
immobilized streptavidin. The affinity of these . background 
peptides for streptavidin is lower than the affinity of 
biotin, and so background peptides are displaced by the free 
biotin. The desired biotinylated peptides were not displaced 
by biotin, because those peptides are allowed to bind first, 
have an affinity similar to that of biotin, and interact 
multivalently with the immobilized streptavidin. 

Thus, the protocol involved lysing the transformed 
cells, removing cellular debris by centrifugation, and 
collecting the crude lysate, from which plasmids encoding 
biotinylation peptides were isolated by affinity enrichment on 
streptavidin, as described in the Examples below. This 
process was repeated three times, starting with. the plasmids 
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collected at the end of each previous cycle, with the results 
shown in Table 2. 

Table 2 





Cvcle 1 


Cvcle 2 


Cvcle 3 


cvcle 4 


Input plasmlds 


6.8 X 10^^ 


2.3 X 10^^ 


1.65 X 10^^ 


8.5 X 10^ 


Recovered plasmids 


7.6 X 10^ 


1.4 X 10^ 


3.0 X 10^ 


. 1.42 X 10*^ 


% Recovered 


0.00011 


0.006 


0.0018 


0.166 


Negative Control 

Recovered Plaamids 


N.A. 


3.8 X 10^ 


3.8 X 10^ 


1.1 X 10^ 


% Recovered 

(Negative Control) 


N.A. 


0.00017 


0.00022 


0.013 


Enrichment Factor 


N.A. 


3.6 


8 


13 



These results indicated that the library contained members 
that displayed biotinylation peptides and so could be enriched 
and identified. Several of the isolates from the fourth round 
of streptavidin binding were tested to determine whether the 
displayed peptides directed biotinylation. The sequences of 
the random peptides in the positive clones are shown in 
Table 3, ranked in order of the strength of their reaction in 
an ELISA. The sequences reveal not only residues that tend to 
move closer to the consensus sequence defined by the known 
biotinylated proteins but also residues that are different 
from the known consensus sequence. The peptides do not have 
sequence motifs (such as HPQ) that have been associated with 
weak binding to streptavidin (Devlin et al., supra ). 
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Table 3 
LEEVDSTSSAIFDAMKMVWISPTEFR 
QGDRDETLPMILRAMKMEVYNPGGHEK 
.SKCSYSHDLKIFEAQKMLVHSYLRVMYNY 
MASSDDGLLTIFDATKMMFIRT 
SYMDRTDVPTILEAMKMELHTTPWACR 
SFPPSLPDKNI FEAMKMYVIT 
SWPEPGWDGPFESMKMVYHSGAQSGQ 
VRHLPPPLPALFDAMKMEFVTSVQF 
DMTMPTGMIKIFEAMKMEVST 
ATAGPLHEPDIFLAMKMEWDVimAGQ 

SMWETLNAQKTVLL 
SHPSQLMTNDIFEGMKMLYH 
SIERGGSTHKILAAMKMYQVSTPSCS 
TSELSKLDATIFAAMKMQWWNP6 
VMET6LDLRPILTGMKMDWIPK 



SEQ. 


ID 


NO: 14) 


SEQ. 


ID 


NO: 15) 


SEQ. 


ID 


NO: 16) 


SEQ. 


ID 


NO: 17) 


SEQ. 


ID 


NO: 18) 


SEQ. 


ID 


NO: 19) 


SEQ. 


ID 


N0:20) 


SEQ. 


ID 


N0:21) 


SEQ. 


ID 


N0:22) 


SEQ. 


ID 


N0:23) 


SEQ. 


ID 


NO:24} 


SEQ. 


ID 


N0:25) 


SEQ. 


ID 


NO:26) 


SEQ. 




NO:27) 


SEQ. 


ID 


NO: 28} 



20 



25 



30 



35 



The sequences of the biotinylated clones from the first 
library, shown in Table 3, are aligned at the prestimably 
modified K residue. Several clones were present more than 
once in the set of 20 . sequences obtained, so only 15 
independent sequences are shown. At some positions in the 
sequences, no clear consensus is apparent. At other residues, 
however, clear trends emerge. For example, position -4 
(relative to the K residue) was designed to encode V to match 
that residue of E. coli biotin carboxyl carrier protein (GTT 
codon with each base synthesized 91% as designated, 3% each of 
the other bases). In spite of this very light mutagenesis, 
every sequence had a mutation that changed the encoded amino 
acid to either L or P. L is the residue found at this 
position in most of the naturally biotinylated sequences from 
organisms other than E. coli, but F was not present in the 
sequences examined. Residue -3, encoded in the library by a 
random (NNK) codon, was negatively charged (E or D) in 9 of 
the 15 sequences. Again, this consensus sequence is similar 
to that found in the naturally occurring sequences (E or S) . 
The +3 position, however, defines a new consensus not found in 
the natural sequences. 15 of the IS peptides had a 
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hydrophobic residue (W, Y, V, F, or L) at +3, instead of the 
most connonly found T from the enzyme sequences. 

Perhaps the most revealing sequence from the first 
library was SMWETLNAQKTVLL (SEQ. ID NO: 24), which arose from a 
single base deletion during synthesis or cloning of the 
library oligonucleotide. This sequence matches only three 
residues in the enzyme consensus sequence, but does fit the 
pattern of the other library clones at positions +2 and +3. 
These results show that the evolutionary constraints on the 
enzyme sequence result from a combination of factors, only one 
of which is the ability to be biotinylated. 

To define more clearly the consensus sequence for 
biotinylation, three additional liisreuries were screened (see 
Tables A, 5, and 6, below). Two were based on the pattern 
from the clones isolated from the first library, and the other 
consisted simply of a K residue flanked on both sides by 10 
random residues. After four rounds of panning, a restriction 
fragment containing the random region was subcloned from the 
pool of enriched clones into an MBP (maltose binding protein) 
expression vector (see U.S. patent application Serial No. 
876,288, filed April 29, 1992, incorporated herein by 
reference) . These populations of plasmids were then screened 
using a colony lift technique involving detection with a 
streptavidin-alkaline phosphatase conjugate. The 
biotinylation of several of these clones was confirmed by 
labeling with %-biotin. 

The second library was constructed with a random 
peptide coding sequence defined by xxxIFgAMKMxxxxx (SEQ. ID 
NO: 29); where X is an NNK codon, underlined single residues 
are codons for the amino acid shown but with a 70/10/io/lO 
mutagenesis mixture (70% of the base that encodes the amino 
acid at a particular position in the codon and 10% each of the 
other three bases) , and the codon for K is fixed. The 
biotinylated sequences isolated and sequenced from this 
library are shown in Table 4. 
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Table 4 








LHHILDAQKMVWNHR 


(SEQ. 


ID 


N0:30) 




PQGIFEAQKMLWRS 


(SEQ. 


ID 


N0:31) 




LAGTFEALKMAWHEH 


(SEQ. 


ID 


NO:32) 


- 5 " 


LNAIPEAMKMEYSG 


(SEQ. 


ID 


NO:33) 




LGGIFEAMKMELRD 


(SEQ. 


ID 


NO:34) 




LLRTFEAMKMDWRNG 


(SEQ. 


ID 


NO: 35) 


' .V"; - ■ • 


LSTIMEGHKMyiQRS 


(SEQ. 


ID 


NO:36) 




• XiSDXFEArmllvYRPC 


(SEQ. 


ID 


NO:37) 




^L£SmJ£AIlKMQWNPQ 


(SEQ. 


ID 


N0:38) 




T g ^ T a Ufvum TV/ n 
IjoulrpANjWVYKPQ 


(SEQ. 


ID 


NO: 39) 




LAPFFESIIKMvWREH 


(SEQ. 


ID 


N0;40) 




LKGIFEAMKMEYTAM 


(SEQ. 


ID 


N0:41) 




LEGIEEAMKMEYSNS 


(SEQ. 


ID 


NO:42) 


"15 


LLQTFDAMKMEWLPK 


(SEQ. 


ID 


NO:43) 




VFDILEAQKWTLRF 


(SEQ. 


ID 


NO: 44) 




. LVSMPD6MKMEWKTL 


(SEQ. 


ID 


NO: 45) 




LEPIFEAilKHDWRLE 


(SEQ. 


ID 


NO:46) 




XkEIFEGMKMEFVKP . 


(SEQ. 


ID 


N0:47) 




• . ^ . LGGILEAQKMLYRGN 


(SEQ. 


ID 


NO:48) 



The third library was constxucted with a random 
peptide coding sequence defined by xxxxx xIFEAMK Mxxxxx (SEQ. - ID 
NO:49); where X is an NNK codon, underlined single residues 

?5. are codons for the euniho acid shown but with a 70/10/10/10 
iDutagenesis mixture (70% of the base that encodes this amino 
acid at a particular position in the codon and 10% each of the 
other three bases), and the codon for K is fixed. The 
biotinylated sequences isolated and sequenced from this 

30 library are shown in Table 5, 
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Table 5 



RPVLENIFEAMKMEVWKP 




XU 


riUZ dO) 


RSPIAEIFEAMKMEYRET 




XU 


wo: 51) 


QDSIMPIFEAMKMSWHVN 




xr> 

jLu 


NO: 52) 


D6VLFPIFESMKMTPT.pt 




xD 


NO: 53) 


VSRTMTNPE AMTfMT VTinr 


(S£Q« 


ID 


NO: 54) 


DVLLPTVPEAMTfMVTTV 


(SEQ, 


ID 


NO: 55) 


PNDLERT FDAMK 1 vmitwq 


(5£Q, 


ID 


NO: 56) 


TRATiT.PTPT^A ATTMT v/mjt 


(SEQ. 


ID 


NO: 57) 


PTW7HT7fZ T 1?TP 1L lunnurif rriTTTiim 
Jnx^vn vvsxr liAiJlJxPlxTVET 


(SEQ, 


ID 


NO: 58) 


\vuxvux i^xr iuiCiixXQWTSG 


(SEQ. 


ID 


NO: 59) 


LtPGTaPAXrpPClunnurvT i^tm? 
AjAuxixvAVf JLajQJvjyiKi f AuE 


(SEQ. 


ID 


NO: 60) 


V X f AnfUxTl V WXiO X 


(SEQ. 


ID 


NO: 61) 


wxiT' xiyi/xxjrioXlixXVMTSG 


^ J ■ ■ . ■ 
(SEQ. 


ID 


NO: 62) 


RVPLEAIFEGAKMTWVPNM 


(o£Q. 


XU 


NO: 63) 


PMISHKNFEAMKMLFVPE 


(SEQ. 


ID 


NO: 64) 


KLGLPAMFEAMKMEWHPS 


(SEQ. 


ID 


NO: 65) 


QPSLLSIFEAHKMQASLH 


(SEQ. 


ID 


NO: 66) 


LLELRSNFEAMKMEWQIS 


(SEQ. 


ID 


NO: 67) 


DEEDffQIFiaMKMyPLVHVTK 


(SEQ. 


ID 


NO: 68) 



The f6;irth library was constructed with a rahdom 
peptide coding sequence defined by xxxxxxxxxxKxxxxxxxxxx (SEQ. ID 
NO: 69); where X is an NNK codon, and the codon for K is fixed. 
The biotinylated sequences obtained and sequenced from this 
library are shown in Table 6. 

Table 6 

SNLVSLLHSQKILWTDPQSFG (SEQ. ID NO: 70) 

LFLHDFLNAQKVELYPVTSSG (SEQ. ID NO: 71) 

SDINALLSTQKIYWAH (SEQ. ID NO: 72) 

The biotinylation peptides from these libraries 
serve to define further the novel consensus sequence for the 
biotinylation peptides of the present invention. Several 
features are worth noting, a strong preference for L at 
position -8 is clear, especially in the second library, which 
had a shorter random sequence region to the left of the 
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modified K than any of the other libraries. The other sets of 
sequences share this preference at -8, but to a lesser extent 
than. in the second library. The L at this position may be 
more important when there are fewer amino acids connecting the 
biotinylation domain to the carrier protein. There is no 
consensus in the naturally occurring sequences at this site. 

At other positions, many residues are found and only 
a general trend is apparent. For example, many residues are 
. found ait position -6, but not large hydrophobic residues (L, 
V, I, w, F/ or Y) , a tendency that differs from that of the 
f'*^^"5^1^y occurring einzymes (L is most frequent) . Position +4 
. contains a wide variety of residues, but with a clear 
preference for basic amino acids (18 of 56 are R, H, or K) 
oyer acidic residues (no D or E) . 

. At position -2, a preference for small size is 
clear, as only A, G, s, or T are found. Position -1 was 
biased to be M in all libraries except the fourth library, in 
these l>iased libraries, M is fo\uid most often, but Q is 
frequently present. Notably, the mutation from an M codon 
(ATG) to a Q codon (CAA/G) requires two base changes. In the 
clones that were unbiased at this position, 4 of 4 clones have 
Q, indicating that Q might in fact be the preferred residue. 
"^f hydrophobic residues M, I, and V are found in almost all 
of the sequences at position +1. Position +2 is often the 
natural consensus E but also tends to contain the hydrophobic 
residues L, V, Y, and I . 

To explore the general utility of the biotinylation 
sequences and to expand their possible uses, a library was 
made so that the biotinylation peptides would be expressed in 
a fusion protein at the N-terminus of cytoplasmic MBP. This 
librairy was heavily biased in favor of sequences that fit the 
consensus sequence of the invention, with a random peptide 
defined by MAXXLX XI f F/Ll (e/d^ aok (K/1) EW fH/R^ Y yyr;r;R (SEQ. ID 
N0;73), in which the underlined residues are fixed; the 
underlined residues are 97/1/1/1 mutagenized codons for the 
residues shown; and X is an NNK codon. The sequences of 
positive clones from this library identified by colony lifts 
are shown in Table 7. . . 
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Table 7 

MASSLRQ*ILDSQKMEWRSNAGGS... (SEQ. ID NO:74) 
MAHSLVPIFDAQKIEWRDPFGGS... (SEQ. ID NO:75) 
MGPDLVNIPEAQKIEWHPLTGGS... (SEQ. ID NO:76) 
MAFSI.RSILEAQKMELRNTPGGS... (SEQ. ID NO:77) 
MAGGLNDIFBAQKIEWHEDTGGS... (SEQ. ID NO:78) 
MSSYLAPIFBAQKIEWHSAYGGS... (SEQ. ID NO: 79) 
MAKALQ*KILEAQKMEWRSHPGGS,.. (SEQ. ID NO:80) 
MAPQLCKIPYAQKMEWHGVGGGS... (SEQ. ID NO:81) 
MAGSLSTIPDAQKIEWHVGKGGS... (SEQ. ID NO:82) 
MAQQLPDIPDAQKIEWRIAGGGS... (SEQ. ID NO:83) 
MAQRLFHILDAQKIEWHGPKGGS... (SEQ. ID NO:84) . 
MA6CL6P1FEAQKMBWRHPVG6 S... (SEQ. ID NO:85) 
MAWSLKPIPDAQKIEWHSP6GGS... (SEQ. ID NO:86) 
MALGLTR1LDAQKIEWHRDS6GS... (SEQ. ID NO:87) 
MAGSLRQILDAQKIEWRRPLGGS... (SEQ. ID NO:88) 
MADRLAYILEAQKMEWHPHKGGS... (SEQ. ID NO:89). 
Q* = supB suppressed amber codon 

The biotinylation of several of these clones was confirmed by 
labeling with ^H-biotin. The ability to express functional 
biotinylation sequences free at either end of a protein 
indicates that there is no reijuirement that either end of the 
peptide be free in order to interact with the biotin 
holoenzyme synthetase. 

As discussed above, the short, biotinylation 
peptides of the invention can be biotinylated jji vivo or in 
vitro and can be used for a wide variety of purposes, 
including purification, immobilization, labeling, and 
detection of proteins. A few illustrative examples include: 
(1) labeling receptors with biotin at a defined site,, so that 
the labeled receptor could be, for instance, bound to 
streptavidin to produce a tetravalent receptor to increase the 
sensitivity of binding assays, such as those described in U.S. 
Patent No. 5,143,854, and U.S. patent application Serial No.. * 
946,239, filed September 16, 1992, each of which is 
incorporated herein by reference; (2) labeling fusion 
proteins containing peptide leads from any screening program, 
so that the labeled fusion proteins can be used to test 
binding of the peptide to receptors in a monovalent format (by 
probing with labeled streptavidin after binding occurs) or in 
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a multivalent format (by prebinding the fusions to labeled 
streptavidin and then testing binding to receptors or so that 
the peptides can be immobilized on streptavidin-coated beads 
or in microtiter wells for probing with receptors, such as 
protease enzymes, in solution; (3) labeling peptides or 
proteins directly by growing cells in the presence of 
tritiated biotin — with a biotin auxotroph, the peptides 
could be labeled at a known specific activity to permit 
quantitative iaeasurements of binding activity; and (4) 
? developing tech^^ doing enzymatic reactions on 

surfaces by exposing libraries of variant immobilized 
sequences to BirA, biotin, and ATP, so that those peptides 
that were substrates would be biotinylated and could be 
detected with labeled streptavidin. 

This . invention ialso embraces kits which are useful 
for producing proteins containing biotinylation peptides. 

^°^P^^®^' ^^^^ instance, a recombinant expression 
polynucleptide which can be used to produce the peptides of 
the invention fused to a coding sequence of choice, auid 
directions for using the polynucleotides. DNA expression 
: polynucleotides may be designed to replicate episomally or to 
integrates into the chromosome of the host cell chosen for 
iexpression. Frequently, the DNA polynucleotides of the kit 
contain a multiple cloning site linked to sequence coding for 
the peptides of the invention, such that any coding sequence 
may be insert correct trans latiorial reading frame for 

expression. These kits may be used to produce the peptides of 
the invention fused to the amino terminus, the carboxyl 
terminus, or internal to the coding sequence of choice, 
within these fusion proteins, the peptides of the invention 
may be separated from the coding sequences by additional 
spacer sequences. 

Expression of coding sequence will preferably be 
under control of an inducible promoter; some examples are the 
lac or tac promoter in E> coli , the aal4 promoter in S. 
cerevisiae, the alaA promoter in Aspergillus niqer ,. or the 
murine metallothionein promoter in many mammalian cells. 
Alternatively, constitutive promoters may be desirable for 
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certain applications, such as the SV40 early promoter in 
xnammalian cells. For some applications, such as in vitro 
translation in rabbit reticulocytes, the ability to synthesize 
RNA in vitro using a RNA polymerase such as that from the 
bacteriophage SP6 will be needed. In that case, signals for 
initiation of transcription by both SP6 RNA polymerase and an 
alternative RNA polymerase can be operably linked to the same 
expression sequence. 

Besides a promoter for initiation of the expression 
sequences, the polynucleotides of the kits will also 
preferably contain sequences for transcriptional termination, 
such as the T7 terminator in E. coli or the SV40 terminator in 
mammalian cells. Additionally, when the proteins are 
expressed in mammalian cells, a signal for polyadenylation is 
desirable, such as the SV40 poly adenylation sequence. 

Of course, additional sequences may also be included 
in the polynucleotides of these kits which will confer 
additional properties on the proteins produced. For example, 
a signal sequence which causes the expressed proteins to be 
secreted from the cell may be incorporated into the 
polynucleotides. Sequences which serve to link expressed 
proteins to the membrane, such as a sequence encoding a 
hydrophobic membrane spanning domain, or an encoded sequence 
which signals attachment of a glycosyl-phosphatidylinositol 
membrane anchor to the protein, may be included as part of the 
expression polynucleotide. The polynucleotides may also 
encode a sequence recognized by a protease, such as factor Xa, 
adjacent to the sequence encoding the biotinylation peptides 
of the invention. One of skill in the art will recognize that 
these and many other combinations of additional sequences may 
be advantageous. 

Other constituents of the kits may comprise host 
cells suitable for obtaining expression from the 
polynucleotide, avidin or streptavidin coupled to a solid 
support, avidin or streptavidin coupled to a detectable label 
such as the enzyme horseradish peroxidase, a biotinylation 
enzyme such as purified BirA, and instructions for analysis 
and purification of the proteins expressed using these kits. 
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Preferably, the host cells will express a biotinylating 
enzyme. Optionally, polynucleotides which, when transformed 
into host cells, cause the overproduction of biotinylating 
enzymes may be supplied in the kits, or the host cells 
provided with the kits may be already modified to over-produce 
biotinylating enzymes. However, for some applications the 
absence/of biotinylating enzyme in the host cell may be 
adveuitageous . For example, the kit user may prefer to 
biptihylate the expressed fusion proteins in vitro . 

Those of skill in the art recognize from the 
description above that the present invention provides many 
^advantages and more applications than prior art methods for 
biotinylating proteins. The biotinylation peptides of the 
invention are small but specific, allowing one to label a 
protein at a defined site, at either end of or internally to 
the prbtiein to be labelled. The invention provides an 
improved immobilization method, allowing one to avoid the use 
of antibodies and the problems attendant thereto. The high 
binding affinity of «xe avidiii-biotin interaction provides 
advantages for labelling, localization, detection, 
ijomobilizatipn, and purificjation methods as well. For 
instance, one could use the biotinylation peptides of the 
invention to purify BirA protein or other biotinylation 
Enzymes. The peptides of the invention can serve as the 
substrate in an assay to screen for the presence of novel 
biotinylation enzymes. The biotinylation reaction can occur 
in vivo (where few other proteins are naturally biotinylated) 
or in vitro, with readily available materials. As can be 
appreciated from the disclosure above, the present invention 
has a wide variety of applications. Accordingly, the 
following examples are offered by way of illustration, not by 
way of limitation. 

Example 1 
Library Construction 
The peptides on plasmids libraries were made in 
vector ;pJS142, a derivative of plasmid pMC5 described in U. S. 
patent application Serial No. 963,321, filed October 15, 1992, 



WOM«M«» • PCT/BSM«»52. 

^P 

incorporated herein by reference. This vector is designed to 
link the random region of a library to lad through a linker 
encoding the sequence WHGEQVGGEASGGG (SEQ. ID NO: 90). The 
first library was made by annealing phosphorylated 
oligonucleotides ON-1396 
(GAGGTGGTOTflKNNKNNKlTOKNNKNNKNNKNN^^ 

^NNKNNKNNKNNKNNKNNKNNKNNKN^ (SEQ. ID NO: 91) , 

Where lower case letters designate bases synthesized from 
mixtures of 91% of that base and 3% of each of the other 
bases, referred to as "91/3/3/3 mutagenesis", N means an 
equimolar mixture of all 4 bases, and K means an eguimolar 
mixture of 6 and T) , ON-829 (ACCACCTCCG6) (SEQ. ID NO:92), and 
0»-830 (TTACTTA6TTA) (SEQ. ID NO: 93) each at a conbentration 
of 1 TtM in 0.1 M Nad, 50 mH Tris pH 7.4, by heating to vo* 
for 10 min., and allowing the reaction to cool over several 
hours to below 15'. The annealed oligonucleotides (5.2 pmol) 
were ligated to 10 ^g (2.6 pmol) of S£±I digested pJS142 in 
0.5 mL of 20 mM Tris pH 7.4, 10 mH MgClj, 0.1 mM EDTA, i mM 
ATP, 50 ^g/mL BSA, 2 mM DTT, containing 800 cohesive end units 
of T4 DNA ligase (New England BioLabs) overnight at 14». The 
ligations were then heated to 65« for 10 min. The single 
stranded gap was filled by addition of 26 units of Sequenase" 
2.0 (United states Biochemical) in the presence of 0.2 mM 
«aNTPs. The DNA was phenol/CHClg extracted, precipitated with 
isopropanol, and used to transform ARI 280 7Ion-li suIAl 
hsdRl? A(ompT-fepC) AclpA319: :kan Alaci lacZUllB 
recA::cat) to yield a library of 5 x 10^ independent 
transformants that was amplified and stored as described in 
U.S. patent application Serial No. 963,321, filed October 15, 
1992, and Cull et al., 1992, Proc . Natl . Acad . §ci. JZgA 
M: 1865-1869, each of which is incorporated herein by 
reference, except that the cells were stored in 35 mM HEPES 
pH 7.5, 0.1 mM EDTA, 50 mM KCl (HEK buffer). 

The second (5 x 10^ transformants), third (5 x lo^ 
transformants), and fourth (2.2 x lO^ transformants) libraries 
were constructed as described above using ON-1544 

(GAGGTGGTNNKNNKNNKatctttgaagctatgAAAatgNNKNNKNNKNNKNNKTAACTA^^ 
TAAAGC) (SEQ. ID NO: 94), where lower case letters designate 
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70/10/10/10 mutagenesis) , ON-1545 

(GAGGTGGTNNKNNKNNKNNKNNKNNKatctttgaagctatgAAAatgNNKN^ 
KTAACTAAGTAAAGC) (SEQ. ID NO: 95) , where lower case letters 
designate 70/10/10/10 mutagenesis) , and ON-828 
. (GAGGTGGTNNKNNKNNKNNKNNKNNKNNKNN^^ 

KNNIOINKNNKTAACTAAGTAAAGC) (SEQ. ID NO: 96), respectively, in 
place of ON-1396. The fourth library was made with 30 lig of 
vector PJS141, which differs from pJS142 only in that the 
coding sequence of lad was altered to encode s. A, and S, 
respectively, in place of the C codons normally found at 
positions 107, 140, and 281. The library was amplified by 
transformation of strain ARI 246 (lon~ll sulAl hsdR17 
.A(ompT-'fepC) AclpA3i9::kan . laGl42: :TnlO lacZUlia) . 

The fifth library was constructed in the vector 
pBAD/MBP-N, a derivative of pBADlS, see U.S. patent 
application Serial No. 965,677, filed October 22, 1992, 
incorporated herein by reference; that places a polylinker and 
the coding sequence for amino acids 27-393 of MBP downstream 
from the arabinose-inducible ara B promoter. The library was 
made by ligating annealed ON-1699 

(CTAGCTAACTAATGGAGGATACATAAATGgctNNKNNKctgNNKNVKattttNgaNgctca 
rAAAatNgaatggcryNNKNNKNNKGGTGGTAGCC) (SEQ. ID NO: 97) , where 
lower case letters designate 97/1/1/1 mutagenesis; V=A, C, or 
G; r=g or a; y-C, t) , ON-1700 (TCCTCCATTAGTTAG) (SEQ. ID 
NO:98) , and ON-1701 (TCGAGGCTACCACC) (SEQ. ID N0:99) to 
laiel-MioI digested pBAD/MBP-N, as described above. The 
library was used to transform XLl-Blue (f proAB laclq 
lacZAMlS TnlO(tetR) // recAl endAl gyrA96 thi hsdR17 
supE44 relAl lac, Stratagene) and screened by colony lifts 
as described below. 

Example 2 
Panning 

About 2 mi. of thawed cells in HEK were added to 6 mL 
of 25 mM HEPES pH 7.5, 0.07 mM EDTA, 8.3% glycerol, 1.25 mg/mL 
BSA, 0,83 mM DTT, and 6.2 mM PMSF. The cells were lysed for 2 
to 4 min. on ice by the addition of 0.15 mL of lo mg/mL 
lysozyme (Boehringer Mannheim) , 'and then, 2 mL of 20% lactose 
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and 0.25 mL of 2 M KCl were added. The supernatant from a 15 
mm., 27,000 X g centrifugation was added to o.l mL of 
streptavidin-agarose beads (Pierce) in l mL HEK, 0.2 M lactose 
(HEKL), 4.5% BSA, 0.9 mg/mL herring DNA and mixed gently at 40 
for 1 hour. The beads were centrifuged and washed 4 times 
with HEKL buffer, 1% bsa, and O.l mg/mL herring DNA at 4- (in 
later rounds, these washes sometimes contained 10 ^ biotin) 
and then incubated for 30 min. at 4- in the same buffer plus 
10 MM biotin. The beads were washed 5 times with HEKL buffer 
1% BSA, twice with HEKL buffer, and once or twice with HEK ' 
buffer at 40. The bound plasmids were eluted with 35 mM HEPES 
PH 7.5, 0.1 mM EDTA, 200 mM KCl, 1 mM IPTG, 10 ^g/mL 
self-annealed ON-413 (GAATTCAATTGTGA6CGCTCACAATTGAATTC) (SEQ 
ID no: 100) for 30 min. at room temperature, precipitated witli 
isopropanol, and then used to electrotransform either ARI 280 
or ARI 298 rion-li sulAl hsdi?l7 ArompT-fepc; AclpA315: :*an 
^acj lacZOlia recA::cat cytR) for amplification. 

Example 3 
Subclon rna into mbp v«a <;,.^-^^ 
Plasmids recovered from panning were digested with 
BSEEI and seal, and a fragment containing the peptide coding 
sequence was subcloned into ^i, seal, digested plasmid 
PEIM3, a derivative of pMALc2, which is available from New 
England Biolabs, designed to accept inserts of coding sequence 
from PJS142. The transferred fragment encodes GGG-peptide and 

ri'r'tlV' ^""^ -coding 
N10LGIE6RT. The MBP is retained in the cytoplasm due to the 
lack of a signal sequence. 

Example 4 
Labeling ^i jzh ^H-bini-ir> 
^ cells were grown at 37 » overnight in minimal medium 
E (Davis, 1980, Advanced Bacterial fienetics (CSH Press)) with 
0.4% glycerol, 0.1% vitamin assay casamino acids (Difco) i 
^g/va. thiamine, and 50 ^g/mL ampicillin. The cultures were 
diluted 1/10 in the same medium, grown for several hours, and 
then added to an equal volume of medium containing 2 ;xCi/mL 
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H-Biotin (Amersham) . and 0.6 mM IPTG (for pELM3 clones) or 
0,4% L-arabinose instead of glycerol (for pBAD/MBP-N clones). 
Growth was continued for an additional several hours, and then 
the cells were harvested and lysed with SDS protein gel 
buffer. The samples were run on a 4-20% gradient acrylamide 
gel, and f luorographed using Amplify (Amersham) and X-rav 
. film. 

Example 5 
: Colony Lifts . . 

Colony lifts were performed in duplicates 
essentially as described (Sambrook, 1989, MaOgSlllSE Cloning. ^ 
Laboratory Manual (CSH Press) ) , except that the inducing 
plates contained 10 MM biotin and 0.3 mM IPTG (for pELM3 
clones) or 0.2% L-arabinose (for pBAD/MBP-N clones) . The 
blocking agent was 5% BSA, and the probe was 1/5000 diluted 
streptavidin-alkaline phosphatase conjugate (Gibco BRL). 

■ . Example 6 
Over express ion of BirA 
; • . T^^^ was cloned under the control of 

inducible promoters on two different plasmids. The birA gene 
was amplified from the plasmid pBA2 2 (see Barker and Campbell, 
1981, J. Mol. Biol. 146:469-492) using primers ON-1589 (SEQ. 
ID NOrlOl) (5'TAC ACT GCT AGC TAA CTA ATG GAG GAT ACA TAA AT6 
AAG GAT AAC ACC GTG CCA CT6 3 • ) and ON-1590 (SEQ. ID NO:102) 
(5» ;.GTA TCA GAG CTC TTA TTT TTC TGC ACT ACG CAG GGA 3 • ) in a 
polymerase chain reaction (PCR) . The fragment was digested 
with Saci and Nhel and cloned into Sacl, Nhel digested plasmid 
PJSIOO, pliacing birA under control of the asaBAD promoter. 
The resulting plasmid, called PJS170, contains a pBR322- * 
derived replication origin, an ampicillin resistance gene, and 
•toe arac gene, - wliich encodes a regulator of the araBAD 
promoter. Induction of MrA expression from this plasmid in 
LB + 0.2% arabinose allows expression of large amounts of BirA 
protein. . . 

- The birA gene fragment was also subcloned into Sad, 
Spel digested plasmid pIQCAT-LC9 . This places MrA under 
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control of the £afi promoter, which is inducible with IPTG. 
This plasmid, called pJS169, also contains a plSA replication 
origin, a chloramphenicol resistance gene, and the lacio 
allele of the laci gene, which encodes a repressor of the l^S 
or tac promoters. The piSA replication origin permits this 
plasmid to replicate in the same cell as pBR322-derived 
plasmids. Thus, BirA can be overexpressed in the same cell 
that is expressing the biotinylation target. Cells carrying 
PJS169 grown in LB + 0.3 mM IPTG overexpress BirA to a lesser 
extent than cells carrying pjsi70 induced with 0.2% arabinose. 

Example 7 

Enhanced Biotinylation in an 
E. coli strain over-producing Ri-rA 

The efficiency of MBP-peptide fusion biotinylation 
was determined under two growth condition using a band shift 
assay. This assay was performed by mixing deglycosylated 
avidin (UltraAvidin, Leinco Technologies) with a crude cell 
lysate from an E. coli strain that overexpressed the MBP- 
peptide fusion. The mix was electrophoresed on a 4-20% 
acrylamide nondenaturing gel compared to the lysate without 
avidin. Comparison of the two lanes permitted quantitation of 
the efficiency of biotinylation by observation of the band 
shift caused by the added avidin. Fusion proteins expressed 
xn a strain carrying pJS169 (with MeA induced with 0.3 mM 
IPTG) in LB media containing IOmM biotin were biotinylated to 
a greater extent than those expressed in the absence of extra 
BirA and added biotin. 

Example 8 

Biotinvlation of y«> combinan1^ proteins ,H4.^^ 
1. BirA overesqpression and purification 

BirA can be purified either by published procedures 
(see Buoncristiani and otsuka, 1988, j. Biol. Chem. 
263(2): 1013-1016), or by the following procedure. 

A single colony of E. coli strain BL21 transformed 
with PJS169 was grown overnight in 50 mis of LB + ampicillin. 
This culture was innoculated l:loo into 1 liter of LB + 
ampicillin and grown at 37 'C with shaking until the 0D«„„ = 

QUO 
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0.5. After induction with 0.4 nM IPTG, the cells were grown 
an additional 4 hours, harvested by centrifugation, 
resuspended in 20inM Tris-HCl pH 7.4 + 5inM DTT (TD5) , and lysed 
by sonication. Cellular debris was removed by centrifugation 
and the supernatant diluted to 100 ml total volume with TD5 
buffer. 

Crude supernatant was loaded onto a 10 ml Blue 
Sepharose FF column (Pharmacia) and washed through with TD 
j20mM Tris-HCl pH 7.4) until the A280 of the column flow- 
.. through was about 0. This column was eluted with a 100 ml 
gradient of 0-1. 5M NaCl in TD and 2 ml fractions collected. 
BirA-cpntaining fractions were pooled and dialyzed against TD 
until the Nad concentration was about 15 mM. 

The dialysate was concentrated using an Amicon YM30, 
and then loaded over 5ml S Sepharose FF column (Pharmacia) and 
washed through with TDl (20mM Tris-HCl pH 7.4 + imM DTT) . 
Protein was eluted with a 50 ml gradient of 0-350 mM NaCl in 
TDl. 

BirA-containing fractions were pooled, bound to a 
Biotin-sepharose column, washed with TDl/150mM NaCl, and 
eluted with TDl/ 150mM NaCl +2mM biotin. BirA-containing 
-fractions were dialyzed over YM30 against TDl/150mM NaCl to a 
final volume of lo mi. 

2. Biotinylation in vitro using purified BirA enzyme. 

Proteins fused to one of the peptides of the 
invention were biotinylated In vitro at 37«»c in a buffer 
containing: RPMI medium 1640 (Gibco-BRL) supplemented with 5mM 
ATP, SmM MgClj, and lO^tM biotin. 

Although the foregoing invention has been described 
in some detail by way of illustration and example for purposes 
of clarity and understanding, it will be apparent that certain 
Changes and modifications may be practiced within the scope of 
the appended claims. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: Affymax Technologies N.V* 
(ii) TITLE OP INVENTION: Biotinylation of Proteins 
(iii) NUMBER OP SEQUENCES: 102 

(iv) CORRESPONDENCE ADDRESS: 

i»} Townsend and Townsend Khourie and Crew 

(B) STREET: One Market Plaza, Steuart Tower 

(C) CITY: San Francisco 

(D) STATE: California 

(E) COUNTRY: USA 
(P) ZIP: 94105 

:; (V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: PatentIn Release #1.0, Version #1.25 

(Vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT 

(B) PILING DATE: 

(C) CLASSIFICATION: 

(Vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/099.991 

(B) PILING DATE: 30-JUL-1993 

(yiii) ATTORNEY/ AGENT INFORMATION: 
. (A) NAME: Smith, William M. 

(B) REGISTRATION NUMBER: 30,223 

(C) REFERENCE/DOCKET NUMBER: 1038.1 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 415-326-2400 

(B) TELEFAX: 415-326-2422 
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(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

. (ii) MOLECULE TYPE: peptide 

. <ix) FEATURE: 

(A) NAME/KEY: Region 

(B) LOCATION: one-of(2^ 3) 

(D) pTHER INFORMATION: /notes "Xaa at position 2 is any 
amino acid; at position 3 is any amino acid other 
than Leu, Val, He, Trp, Phe or Tyr.- 

.(ix) FEATURE: 

(A) NAME/KEY: Region 

(B) LOCATION: one-of(5, 6, 7, 8) 

(D) OTHER INFORMATION: /note= "Xaa at position 5 is Phe or 
Leu; at position 6 is Glu or Asp; at position 7 is 
Ala, Gly, Ser or Thr; at position 8 is Gin or 

(ix) FEATURE: 

(A) NAME/KEY: Region 

(B) LOCATION: one-of{10, 11, 12, 13) 

(D) OTHER INFORMATION: /note= -Xaa at position 10 is He, 
Met or Val; at position 11 is Glu, Leu, Val, Tyr 
or He; at position 12 is Trp, Tyr Val, Phe, Leu 

. (xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

Leu Xaa Xaa He Xaa Xaa Xaa Xaa Lys Xaa Xaa Xaa Xaa 

^ . 1 • "5 ■ • 10 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
(0) TOPOLOCy: linear 

(ii) MOLECULE TYPE: peptide 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
Xaa lie Val Xaa Ala Met Lys Met Xaa 
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(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

. (C) STRANOEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NOs3: 
: Gly Gin Thr Val Lieu Val Leu Glu Ala Met Lys Met Glu Thr Glu He 

• ' . . 10 ' 15 

• Asn Ala Pro Thr Asp Gly 
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(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
(O) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Gly Glu Val Leu Leu He Leu Glu Ala Met Lys Met Glu Thr Glu He 
• '* ■ ^0 15 

: Arg Ala Ala Gin Ala Gin 
20 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: Single 
; (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



. 1 : (;*^> SEQUENCE DESCRIPTION: SEQ ID N0:5: 
y --^ ---i^- Cys Phe Ala Glu He Glu Val Met Lys Met Val Met Thr Leu 

- ; i V Thr Ala isiy Glu Ser Gly 



wo 95/04069 



36 



PCT/US94/08528 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 22 amino acids 
. (B) TYPE: amino acid 

(C) STRANDEONESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



Gly Asn Thr Leu Cys He Val Glu Ala Met Lya Met Met Asn Gin He 



15 



Glu Ala Asp Lys Ser Gly 
20 
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(2) INFORMATION FOR SEQ ID NO: 7; 

(i) SEQUENCE CHARACTERISTICS: 
. . (A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: . 

Gly Gin Pro Val Ala Val Leu Ser Ala Met Lys Met Glu Met lie lie 

Ser Ser Prp Ser Asp Gly 

20' • ... . 
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(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NOsS: 

Gly Gin Pro Leu Cys Val Leu Ser Ala Met Lye Met Glu Thr Val Val 



5 



15 



Thr Ser Pro Met Glu Gly 
20 
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(2) INFORIATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Gly Gin Pro Leu Val Leu Ser Ala Met Lys Met Glu Thr Val Val Thr 

r^;!;- .•.V:.^;:::.;.;V.. • • :5; . 10 ■ ■ 15 

•> Ser Pro Val Thr . Glu ; 
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(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: 

Gly Ala Pro Leu Val Leu Ser Ala Met Lys Met Glu Thr Val Val Thr 
.^ 5 . " 15 

Ala Pro Arg 



WO95/040<9 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Gly Gin Glu He Cys Val He Glu Ala Met Lys Met Gin Asn Ser Met 

■ ■.*•/ " ■■ \ . *.* 3.0 15 

Thr Ala Gly Lyis Thr Gly 

.:20 . 
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(2) INFORMATION FOR SEQ ID NO: 12: 

. (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Gly Gin Pro Val Leu Val Leu Glu Ala Met Lys Met Glu His Val Val 
■ ^ 10 15 

Lys Ala Pro Ala Asn Gly 



20 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) HOLECOLE TYPE: peptide 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
Ala Met Lys Met 

;.l . 
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(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 
. (A) LENGTH: 26 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNES5: single 

(D) TOPOLOGY: linear 

(ii) HOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Leu Glu 61u Val Asp Ser Thr Ser Ser Ala He Phe Asp Ala Met 
: ^ 5 10 15 

Met Val Trp He Ser Pro Thr Glu Phe Arg 



20 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 27 amino acids 

(B) TYPE: amino acid 

. (C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 
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(2) INFORMATION FOR SEQ ID NO: 16: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 amino acids 

(B) TJTPE: amino acid 

(C) STRANDEDNESS: single 
(O) TOPOLOGY: linear 

(ii) HOLECOLE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Ser Lys Cys Ser Tyr Ser His Asp Leu l-ys He Phe Glu Ala Gin Lye 
V 5 10 15 ^ 

Met Leu Val His Ser Tyr Leu Arg Val Met Tyr Asn Tyr 

..20 25 
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(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: peptide 
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(Xl) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Met Ala Ser ser Asp Asp Gly Leu Leu Thr He Phe Asp Ala Thr Lys 



• Met Met Phe He Arg Thr 

* • • •.20- • 
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(2) INFORMATION FOR SEQ ID NO: 18: 

■ (i)* SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Ser Tyr Met Asp Arg Thr Asp Val Pro Thr lie Leu Glu Ala Met 
• 5 10 15 

Met Glu. Leu His Thr Thr Pro Trp Ala Cye Arg 



20 



25 
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(2) ZNFORIATION FOR SEQ ZD N0:.20: 

' (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 amino acids 

(B) TYPE: amino acid 

(C) STRANDEONESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



<xi) SEQXJENCE DESCRIPTION: SEQ ID NO: 20: 

Ser Val Val Pro Glu Pro Gly Trp Asp Gly Pro Phe Glu Ser Met Lys 

1. :- S 10 15 

Met Val Tyr His ser Gly Ala Gin Ser Gly Gin 



20 



25 
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(2) INFORHATION FOR SEQ ID NO: 19: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION; SEQ ID NO:19: 

Ser Phe Pro Pro Ser Leu Pro Asp Lye Asn He Phe Glu Ala Met Lye 
^ • 5 10 15 

Met Tyr Val He Thr 
20 
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(2) INFORMATION FOR SSQ ID N0:21: 

(i) SEQUENCE CHARACTERISTICS: 

: (A) LENGTH: 25 amino acids 
. „ (B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: peptide 



: (xi) SEQUENCE DESCRIPTION: SEQ ID N0:21: 

Val Arg His Leu Pro Pro Pro Leu Pro Ala Leu Phe Asp Ala Met Lys 

' -,5..: , . . • 10 15 

\-;'''^ ' : ^-^^ Glu Phe Val Thr Ser Val Gin Phe 
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(2) ZNFORMilTION FOR SEQ ID NO: 22: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Asp Met Thr Met Pro Thr Gly Met Thr Lys lie Phe Glu Ala Met Lye 

' 1 ■ 5 . 10 . . IE . • . ' 



Met Glu Val Ser Thr 
20 
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(2) XNFOHHT^TION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

Ala Thr Ala Gly Pro Leu His Glu Pro Asp He Phe Leu Ala Met Lys 
1 5 10 15 

Met Glu Val val Asp Val Thr Asn Lys Ala Gly Gin 
20 25 
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(2) INFOHM21TION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
Ser Met Trp Glu Thr Leu Asn Ala Gin Lys Thr Val Leu Leu 



1 



S 



10 
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(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 
. . (A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEONESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 

Ser His Pro Ser Gin Leu Met Thr Asn Asp He Phe Glu Gly Met Lys 

Met Leu .Tyr His 

• ■ -.- : 20 
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(2) INFOPMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOWGLOGYi linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

Ser He Glu Arg Gly Gly Ser Thr His Lys He Leu Ala Ala Met Lys 
19 10 15 

Met Tyr Gin Val Ser Thr Pro Ser Cys Ser 
20 25 
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(2) INFORIATION FOR SEQ ID NO:27: 

^ ^ (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTHS 23 amino acids 

(B) TYPE: amino acid 

. (C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

: (ii) MOLECOT.E TYPE: peptide 



''■■r^::..:r^-^/^^ (xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 

yfy: { • \:V .;:v^:^ -T Giu Leu Ser Lys Leu Aap Ala Thr lie Phe Ala Ala Met Lye 

• xW^v" -litet Pro Gly 



wo 95/040(9 




PCT/US94/08528 



(2) INFORMATIOK FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Val Met Glu Thr Gly Leu Asp Leu Arg Pro He Leu Thr Gly Met Lys 
1 5 10 15 

. . Met Asp Trp He Pro Lys 
20 
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(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(ix) FEATURE: 
. (A) NAME/KEY: Region 

(B) LOCATION: one-of(l, 2, 3, 11, 12, 13, 14, 15) 
(D) OTHER INFORMATION: /note* "Xaa is iny amino acid coded for by 

the NNK codon." 

; (xi): SEQUENCE DESCRIPTION: SEQ ID NO:29: 

;Xaa Xaa Xaa lie Phe Glu Ala Met Lys Met Xaa Xaa Xaa Xaa Xaa 

• ■ V '5 10 15 



wo 95/040(9 




PCT/US94/08528 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

Leu His His He Leu Asp Ala Gin Lys Met Val Trp Asn His Aro 
1 . 5 10 15^ 
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(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

. (ii) MOLECULE TYPE: peptide 



./;:(3ci) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
•^;^v ^; : - Pro (Sin Gly lie Phe Glu Ala Gin Lys Met Leu Trp Arg Ser 
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(2) INF0RM21TI0N FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANOEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

Leu Ala Gly Thr Phe 61u Ala Leu Lys Met Ala Trp His Glu His 
1 } 5 10 15 
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(2) INFORMATION FOR S£Q ID NO:33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14. amino acids 

(B) TYPE: amino acid 

/ (C) STRANDEDNESS: single 
P) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



; - (xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 

Leu Asn Ala lie Phe Glu Ala Met Lys Met Glu Tyr Ser Gly 
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(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

Leu Gly Gly He Phe Glu Ala Met Lys Met Glu Leu Arg Asp 
5 10 
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(2) INFORMATION FOR SEQ ID NO: 35: 



*. (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



( xi ) SEQUENCE DESCRIPTION : SEQ ID NO : 3 5 : 
V . Leu LiBu Arg Thr Phe Glu Ala Met Lys Met Asp Trp Arg Asn Gly 



5 
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(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Leu Ser Thr He Met Glu Gly Met Lye Met Tyr He Gin Arg Ser 
i 5 10 15 
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(2). INFORMATION FOR SEQ ID NO:37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acida 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

Leu Ber Asp lie Phe Glu Ala Met Lys Met Val Tyr Arg Pro Cys 

:.. 1 • .:• • 5 • *••.•..• 10 • .15 
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(2) ZNFORIATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEONESS: single 
(0) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

Leu Glu Ser Met Leu Glu Ala Met Lys Met Gin Trp Asn Pro Gin 
^ 5 10 15 
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(2) INFORMATION FOR SEQ ID NO: 39: 

• (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
: (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

Leu Ser Asp He Phe Asp Ala Met Lys Met Val Tyr Arg Pro Gin 

. ' •■ • -5 ' . ; 10 15 
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(2) INFORMATION FOR SEQ ID NO: 40: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 

Leu Ala Pro Phe Phe Glu Ser Met Lye Met Val Trp Arg 61u His 



5 



10 



15 
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(2) INFORMJ^TION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 
• (A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 

Leu Lys Gly lie Phe 61u Ala Met Lye Met Glu Tyr Thr Ala Met 
A 5 10 25 
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(2) INFORMATION FOR SEQ ID NO: 42: 

. (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

Leu Glu Gly He Phe Glu Ala Met Lys Met Glu Tyr Ser Asn Ser 
15 10 15 




PCT/US94/08528 



: (2) INFOKlATldN FOR SEQ ID NO: 43: 

. (i) SEQUENCE CHARACTERISTICS s 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



V^- :s vM^};^^^^. ■ : ( DESCRIPTION: SEQ ID NO:43: 

' 7^:^- /1^ Ala Met Lys Met Glu Trp Leu Pro Lys 
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(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 



Val Phe Asp lie Leu Glu Ala Gin Lya Val Val Thr Leu Arg Phe 
^ 5 10 15 
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(2) INFORMATION FOR SEQ ID NO:45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(Xi). SEQDiaJCE DESCRIPTION: SEQ ID NO: 45: 

Leu Val Ser Met Phe Asp Gly Met Lys Met Glu Trp Lys Thr Leu 

;-:-^:V-\ . 5 • • : . 10 is 
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(2) ZNFORMI^TZON FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

Leu Glu Pro lie Phe Glu Ala Met Lye Met Asp Trp Arg Leu Glu 
■ 10 15 
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(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: Single 
(0) TOPOLOGY: linear 

(ii) 1K)LECULE TYPE: peptide 



; : (xi) SEQUENCE DESCRIPTION; SEQ ID N0:47: 

Phe Glu Gly Met Lys Met 6lu Phe Val Lye Pro 




wo 95/04069 



PCT/US94/08528 



(2) INFORMATION FOR SEQ ID NO: 48: 

• (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECOLB TYPE: peptide 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NOs48: 



Gly Gly He Glu Ala Gin Lys Met Leu Leu Tyr Arg Gly Aen 



10 



15 
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(2) ZNFOIUiATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(ix) FEATURE: - 

(A) NAME/IffiY: Region 

(B) LOCATION: one-of{l, 2, 3, 4, 5, 6, 14, 15, 16, 17, 18) 

. . (D) OTHER INFORMATION: /note= -Xaa is any amino acid coded for by 

the NNR codon." 

''■^'^■:^ 'ixLi SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

Xaa Xaa Xaa Xaa Xaa Xaa He Phe Glu Ala Met Lys Met Xaa Xaa Xaa 

Xaa: ^Xaa ''''' * . 
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(2) INFORM21TION FOR SEQ ID NO: 50: 

. (i) SEQUENCE CH/mACTERISTICS: 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: peptide 



(xl) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 

Arg Pro Val Leu Glu Asn He Phe Glu Ala Met Lys Met 61u Val Trp 



Lys Pro 
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(2) INFORMATION FOR SEQ ID N0:51: 

. (i) SEQX7ENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 

Arg Ser Pro He Ala Glu He Phe 61u Ala Met Lys Met Glu Tyr Arg 
: 1 5 10 15 

Glu Thr : 
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(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
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(2) INFORIATION FOR SEQ ID NO: 53: 

. ■ (i). SEQUENCE CHARACTERISTICS: 
. . (A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEONESS: single 
:. , (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

Asp Gly Val Leu Phe Pro He Phe Glu Ala Met Lys Met He Arg Leu 

• l; / y 5 ..10 15 

Glu Thr- '.' 
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(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acide 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 



PCT/US94/08S28 
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Val Ser Arg Thr Met Thr Asn Phe Glu Ala Met Lye Met He Tyr Hia 
^ ^ 10 15 

Asp Leu 
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(2) I1IF0RM21TI0N FOR SEQ ID NO: 55 s 

• (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQU^CE DESCRIPTION: SEQ ID NO: 55: 

Asp Val Leu Leu Pro Thr Val Phe Glu Ala Met Lys Met Tyr lie Thr 

. ' ."5 10 15 

'\Lys'; 
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(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: Bingle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 

Pro Asn Asp Leu Glu Arg lie Phe Asp Ala Met Lys lie Val Thr Val 
^ 5 10 15 

His Ser 
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(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 

Thr Arg Ala Leu Leu Glu lie Phe Asp Ala Gin Lys Met Leu Tyr Gin 
1 5 10 15 

Hie Leu ■ . . 



wo 95/04069 



PCT/US94/08528 



88 



(2) INFORMATION FOR SEQ ID NO: 58: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

( ii ) MOLECULE TYPE s peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
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(2) INFORMATION FOR SEQ ID NO: 59: 

( i ) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 18 amino acids 
. . (B) TYPE: aimino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: 

Gly Asp Lys ;Leu Thr Glu He Phe Glu Ala Met Lys He Gin Trp Thr 

/*:f y;^ V- V S . • ^. 10 . . is 

•:.'Ser ' Giy ' 
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(2) INFORMATION FOR SEQ ID NO: 60: 

(1) SEQOENCE CHARACZERZSTZCS: 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

(ii) MQLECOLE T7PB: peptide 



(xi) SBQOBNCB DESCRIPTION: SEQ ID NO:60s 
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Leu Glu Gly Leu Arg Ala Val Phe Glu Ser Met Lys Met Glu Leu Ala 

■ :. ..." 15 

Asp Glu 
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(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SSQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

,. (ii) MOLECULE TYPE: peptide 



PCT/US94/08528 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

yal Ala Asp Ser Hie Asp Thr Phe Ala Ala Met Lys Met Val Trp Leu 

:^:} -. 5. .10 15 

: Asp 'Thr' " 
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(2) INFORMATION FOR SEQ ID NO: 62: 

..(i). SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECDLE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 

Jly Leu Pro Leu Oln Asp He Leu Olu Ser Met Lys He Val Met Thr 
Ser Gly 
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(2) INFORMATION FOR SEQ ID NO: 63: 

' (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
(P) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 

Arg Val Pro Leu Glu Ala He Phe Glu 6ly Ala Lys Met He Trp Val 
•■'Pro Asn-Asn 
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(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 
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(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEONESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



. (xi) SEQUENCE DESCRIPTJON: SEQ ID NO: 65: 
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(2) INFOHMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 

Gin Pro Ser Leu Leu Ser He Phe Glu Ala Met Lys Met Gin Ala Ser 
5 10 15 

Leii Met • 
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(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 
/ . (A) LENGTH: 18 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: peptide 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 

V \ Leu Leu Glu Leu Arg Ser Asn Phe Glu Ala Met Lys Met Glu Trp Gin 

, 1 • . ■ . 5 . . 10 .15 

-..He iser 
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(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



. (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 

Asp Glu 61u Leu Asn Gin He Phe Glu Ala Met Lys Met Tyr Pro Leu 
;1 5 10 15 

Val His Val Thr Lys 
20 
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(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(ix) FEATURE: . 

(A) NAME/KEY: Region 

(B) LOCATION; one-of (1. . 10, 12. ,21) 

(D) OTHER INFORMATION: /note= "Xaa is any amino acid coded for by 

the NNK codon," 

. (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Lys Xaa Xaa Xaa Xaa Xaa 
/ 5 10 15 

Xaa Xaa Xaa Xaa Xaa 
20 
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(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 

Ser Asn Leu Val Ser Leu Leu Hie Ser Gin Lys lie Leu Trp Thr Asp 
^ 5 10 15 

Pro Gin Ser Phe Gly 



20 
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(2) INFORMATION FOR SEQ ID NO: 71: 

. (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 amino acids 

(B) TYPE: amino acid 

^ . . . (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

. (ii) MOLECULE TYPE: peptide 

\ DESCRIPTION: SEQ ID NO: 71: 

Leu His Asp Phe Leu Aen Ala Gin Lys Val Glu Leu Tyr Pro 

■\y,-'\yy''<'-\ -}-^Jy:'^ .••• •*■ ^ 10 15 

•'y:'-''-::- -^.' '-^- VaLl Thr Ser Ser Gly ■ ■ • ' 
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(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 
. Ser Asp lie Asn Ala Leu Leu Ser Thr Gin Lys lie Tyr Trp Ala Hie 



10 



IS 
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(2) INF0HM21TIPN FOR SEQ ID NO: 73: 

; (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

. (C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

(B) LOCATION: one-of{3, 4, 6, 1, 18^ 19, 20) 

(D) OTHER INFORMATION: /note* "Xaa is any amino acid coded for by 

■ the NNK codon." 

; : (ix) FEATURE: 

^ (A) NAME/KEY: Modif ied-site 

(B) LOCATION: one-of{9r 10, 14, 17) 
. (D) OTHER INFORMATION: /note« "Xaa at position 9 is Phe or 
Leu; at position 10 is Glu or Asp; at position 14 
is Met or lie; at position 17 is His or Arg," 

. (xi) SEQUENCE DESCRIPTION: SEQ ID NO:73: 

; hla. Xaa Xaa Leu Xaa Xaa He Xaa Xaa Ala Gin Lys Xaa Glu Trc 

5 10 15 ^ 

Xaa Xaa Xaa Xaa Gly Gly Ser 

20 • 
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(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acida 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 

Met Ala Ser Ser Leu Arg Gin He Leu Asp Ser Gin Lys Met Glu Trp 
5 10 15 

Arg Ser Asn Ala Gly Gly Ser 
20 
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(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
. (D) TOPOLOGY: linear 

. (ii) MOtfiCULE TYPE: peptide 



105 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 

Met Ala H^^^^^ Leu Val Pro He Phe Asp Ala Gin Lys He Glu Trp 



Arg Asp Pro Phe Gly Gly Ser 

'^20'- . 
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(2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 

Met Gly Pro Asp Leu Val Asn He Phe Glu Ala Gin Lye He Glu 
1 5 10 15 

His Pro Leu Thr Gly Gly Ser 



20 
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(2) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS: 
. (A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



L V \v^^ > DESCRIPTION: SEQ ID NO:77: 
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(2) INFORMATION FOR SEQ ID NO: 78; 

(i) SEQUENCE CKAHACTERISTICS : 
(h) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 
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Met Ala Gly Gly Leu Asn Asp He Phe Glu Ala Gin Lys He Glu Trp 

. ' ' ^ ^0 15 



His Glu Asp Thr Gly Gly Ser 
20 
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(2) INFORMATION FOR SEQ ID NO:79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79s 

Met Ser Ser Tyr Leu Ala Pro lie Phe Glu Ala Gin Lys He Glu Trp 
> 5 10 15 

Hie Ser Ala Tyr 61y Gly Ser 
20 
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(2) INFORMATION FOR SEQ ID NO: 80: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) HOLECDLE TYPE: peptide 



(xi) SEQUENT DESCRIPTION: SEQ ID NO: SO: 

Met Ala Lye Ala Leu Gin Lys He Leu Glu Ala Gin Lys Met Glu Trp 
1 5 10 15 

Arg Ser His Pro Gly Gly Ser 
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(2) INFORMATION FOR SEQ ID N0:81: 
' (i) SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 23 amino acide 

( B ) TYPE : amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 

Met Ala Phe Gin Leu Cys Lye lie Phe Tyr Ala Gin Lya Met Glu Trp 



: 1 



5 



His Gly Val Gly Gly Gly Ser 

• 20 
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(2) INFORMATION FOR SEQ ID NO: 82: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) KOLECOLE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 




His Val 61y Lys Gly Gly Ser 
20 
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(2) INFOKMATION FOR SEQ ID NO: 83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

(ii) MOLECUIf TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83: 

Met Ala Gin Gin Leu Pro Asp He Phe Asp Ala Gin Lys He Glu Trp 
15 10 15 



Arg He Ala Gly Gly Gly Ser 
20 
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(2) INFORMATION FOR SEQ ID NO: 84: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS: single 
(D) TOPOXX)GY: linear 

(ii) MOLECULE TYPE: peptide 



(xl) SEQUENCE DESCRIPTION: SEQ ID NO: 84: 

Met Ala Gin Arg Leu Phe His He Leu Asp Ala Gin Lys He Glu Trp 
1 5 10 15 

. His Gly Pro Lys Gly Gly Ser 
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(2) INFORMATION FOR SEQ ID NO: 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 
, (B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85: 

Met Ala 61y Cys Leu Gly Pro lie Phe Glu Ala Gin Lys Met Glu Trp 

: 1 : 5 . .. 10 .15 

Arg His Phe Vai Gly Gly Ser 
20 
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(2) INFORMATION FOR SEQ ID NO: 86: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 

• Met Ala Trp Ser Leu Lys Pro He Phe Asp Ala Gin Lys He Glu Trp 
* 5 10 15 

His Ser ipro Gly Gly Gly Ser 
20 
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(2) ZNFORM2VTION FOR SSQ ID MO; 87: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 
.(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear . 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 

Met Ala Leu Gly Leu Thr Arg lie Leu Asp Ala Gin Lys lie Glu Trp 

/V:,,;. " ■• vA. V \v '5 ; ■ ; . 10 • 15 

His Arg Asp Ser Gly Gly Ser 
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(2) INFORMATION FOR SEQ ID NO: 88: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: 




Arg Arg Pro Leu Gly Gly Ser 
20 
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(2) INFORMATION FOR SEQ ID NO: 89: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



• . (^i) SEQUENCE DESCRIPTION: SEQ ID NO:89: 




\. :. . . 
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(2) INFOIUATION FOR SBQ ID NO: 90: 

(i) SEQUENCE CHAKACTEKISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE; amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 

Val Val His Gly Glu Gin Val Gly Gly Glu Ala Ser iGly Gly Gly 
15 10 IS 
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(2) INFORMATION FOR S£Q ID NO: 91: 

(1) SEQUENCE CHARACTERISTICS: 
. (A) LENGTH: 103 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
. (D) TOPOLOGY: linear . 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: 
6AGGTGGTNN KNNKNNKNNK NNKNNKNNKN NKNNKNNKAT CGTTNNKGCT ATGAAAATGN 
NKNNKNNKNN KNNKNNKNNK NNKNNKNNKT AACTAAGTAA AGO 



60 
103 
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(2) INFORMZVTION FOR SEQ ID NO: 92: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92: 
ACCACCTCC6 G 



11 
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(2) INFORMIkTION Fok SEQ ID NO: 93: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 11 base pairs 
..(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLBCULE TITPE: DNA {oligonucleotide) 



(xi) SEQUENCE DESCRIPTION; SEQ ID N0:93: 
TTACTTAGTT A / 



11 
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(2) INFORMATION FOR SEQ ID NO: 94: 

. (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 67 base pairs 

(B) .TyPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94: 
6AGGT6GTNN KNNKNNKATC TTTGAAGCTA TGAAAATGNN KNNKNNKNNK NNKTAACTAA 
GTAAAGC 



67 



WO95/04IW9 




PCT/US94/08528 



(2) INFORMATION FOR SEQ ID NO: 95: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 76 base pairs 

(B) TYPE: nucleic acid 

. (C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95: 
. GAGGTGGTNN KNNKNNKNNK TTGAAGCTAT GAAAATGNNK NNKNNKNNKN 

NKTAACTAA6 TAAAGC 
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(2) INFOBMATION FOR S£Q ID NO: 96: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 85 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96: 
: GAGGTGGTNN KNNKNNKNNK NNKNNKNNKN NKNNKNNKAA ANNKNNKNNK NNKNNKNNKN 
NKNNKNNKNN KTAACTAAGT AAAGC 
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(2) INFORMATION FOR SEQ ID NO: 97: 

(i) . SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 96 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: 
CTAGCTAACT AAT6GAGGAT ACATAAATGG CTNNKNNKCT GNNKNVKATT TTNGANGCTC 
ARAAAATN6A ATGGCRYNNK NNKNNKGGTG GTAGCC 



60 
96 
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(2) INFORMATION FOR SEQ ID NO: 98: 

(1) SBQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 98: 
TCCTCCATTA 6TTA6 
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(2) INF0RM21TI0N FOR SEQ ID NO: 99: 



(i) SEQUENCE CHAKACTERISTICS: 
. (A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

. (ii) MOLECULE TYPE: DNA (oligonucleotide) 




. . (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 
TCGAGGCTAC CACC 



14 
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(2) INFORMATION FOR SEQ ID NO:100: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(O) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID MO: 100: 
GAATTCAATT GTGAGCGCTC ACAATT6AAT TC 



32 
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(2) INFORMATION FOR SEQ ID NO: 101: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEONESS: single 

(D) TOPOLOGY: linear 

• (ii) MOLECULE TYPE: DNA (primer) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101: 
TACAGT6CTA GCTAACTAAT GGAGGATACA TAAATGAAGG ATAACACCGT GCCACTG 



57 
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(2) INFORMATION FOR SEQ ID NO: 102: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 



(3Ci) SEQUENCE DESCRIPTION: SEQ ID NO: 102: 
6TATCA6AGC TCTTATTTTT CTGCACTACG CAGGGA 



WO95/04069 ^ ^ PCT/US94/08528 

133 

WHAT IS CI AIMED TS i 

1 1. . A method for biotlnylating a protein, said 

2 method comprising: 

3 (a) constructing a recombinant DNA escpression vector 
. 4 that encodes a fusion protein comprising said protein and a 

5 biotihylation peptide less than 50 amino acids in length; 

^ : : (b) transforming a recombinant host cell with said 

•. .7'-. vector; and.. . • • . 

;^ (c) culturing said host cell in the presence of 

9 biotih or. a biotih analogue and under conditions such that 

10 said fusion protein and a biotinylation enzyme are expressed, 

11 resulting in biotinylation of said fusion protein. 

.2.. The method of Claim 1, wherein the host cell is 

2 . E. coll . 

^ ■ :'■ -y.- r '}' The method of Claim 1, wherein said 

2 biotinylation peptide comprises an amino acid sequence defined 

3 by: ;L3?iX2IX3X4X5X6KX7X8X9Xip (SEQ. ID N0:1) , wherein Xi is any 

4 amino icid; Xj is any amino acid other than L, v/ I, W, F, Y; 

5 X3 is F or L; X4 is E or D; X5 is A, G, S, or T; Xg is Q or M; 

6 X7 is I, M, or V; Xq is E/ L, V, Y, or I; X9 is W, Y, V, F, L, 
; °^ I; and X^o i^ R, H, or any amino acid other than D of E. . 

^ P 4. A method for biotlnylating a protein, said 

2 method comprising: 

3 (a) constructing a recombinemt DNA expression vector 

4 that encodes a fusion protein comprising said protein and a 

5 biotinylation peptide less than 50 amino acids in length; 

^ (b) producing said fusion protein encoded by said 

7 vector either by transforming a recombinant host cell with 

8 said vector and culturing host cells transformed with the 

9 vector or by incubating said vector in a cell-free 
10 transcription and translation system; and 

(c) incubating said fusion protein in a reaction 

12 mixture comprising biotin or a biotin analogue and a 
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13 biotinylation enzyme, resulting in biotinylation of said 

14 fusion protein. 

1 5. A recombincint DNA vector that comprises a 

2 nucleic acid that encodes a biotinylation peptide less than 50 

3 amino acids in length. 

^ The recombinant DNA vector of claim 5/ wherein 

2 said biotinylation peptide is LX1X2IX3X4X5X6KX7X8X9X10 (SEQ. id 

3 NO:l), where x^ is any amino acid; X2 is any amino acid other 

4 than L, V, I, W, F, Y; X3 is P or L; X4 is E or D; Xg is A, G, 

5 S, or T; Xg is Q or M; X7 is I, M, or V; Xg is E, L, V, Y, or 

6 I; Xg is W, Y, V, F, L, or I; and X^q is preferably R, H, or 

7 any amino acid other than D or E. 

^ 7. The recombinant DNA vector of Claim 5/ wherein 

2 said biotinylation peptide is selected from the group 

3 consisting of LEEVDSTSSAIFDAMKMVWISPTEFR (SEQ. ID NO: 14) , 

4 QGDRDETLPMILRAMKMEVYNPGGHEK (SEQ. ID NO: 15), 

5 SKCSYSHDLKIFEAQKMLVHSYLRVMYNY ; (SEQ. ID NO: 16), 

6 MASSDDGLLTIFDATKMMFIRT (SEQ. ID NO: 17) , 

7 SYMDRTDVPTILEAMKMELHTTPWACR (SEQ. ID NO: 18), 

8 SFPPSLPDKNIFEAMKMYVIT (SEQ. ID NO: 19), 

9 SWPEPGWDGPFESMKMVYHSGAQSGQ (SEQ. ID NO: 20) , 

10 VRHLPPPLPALFDAMKMEFVTSVQF (SEQ. ID NO: 21) , 

11 DMTMPTGMTKIFEAMKMEVST (SEQ. ID NO: 22), 

12 ATAGPIJIEPDIFIAIOOIEVVDVTNKAGQ (SEQ. ID NO:23), SMWET^ 

13 (SEQ. IDNO:24), SHPSQLMTNDIFEGMKMLYH (SEQ. ID NO: 25) , 

14 TSELSKLDATIFAAMKMQWWNPG (SEQ. ID NO: 27), 

15 VMETGLDLRPILTGMKMDWIPK (SEQ. ID NO: 28) , LHHILDAQKMVWNHR (SEQ. 

16 ID NO: 30), PQGIFEAQKMLWRS (SEQ. ID NO: 31), LAGTFEALKMAWHEH 

17 (SEQ. IDNO:32), LNAIFEAMKMEYSG (SEQ. IDNO:33), 

18 LGGIFEAMKMELRD (SEQ. ID NO: 34), LLRTFEAIIKMDWRNG (SEQ. ID 

19 NO: 35), LSTIMEGMKMYIQRS (SEQ. ID NO: 36), LSDIFEAMKMVYRPC ( SEQ . 
20. ID NO:37), LESMLEAMKMQWNPQ (SEQ. ID NO:38), LSDIFDAMKMVYRPQ 

21 (SEQ. ID NO: 39) , LAPFFESMKMVWREH (SEQ. ID N0:40), 

22 LKGIFEAMKMEYTAM (SEQ. ID NO:41) , LEGIFEftMKMEYSNS (SEQ. ID 

23 NO:42), LLQTFDAMKMEWLPK (SEQ. IDNO:43), VFDILEAQKWTLRF (SEQ. 
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ID NO: 44), LVSMFDGMKMEWKTL (SEQ. 


ID NO: 45), LEPIFEAMKMDWRLE 


25 


( SEQ . ID NO : 4 6 ) , LKEIFE6MKMEFVKP 


(SEQ. ID NO:47) , 


26 


LGGILii^QKHLyRGN (SEQ. ID NO: 48) , 


RPVLENIFEAMKMEVWKP (SEQ. ID 


27 


NO : 50 ) , RSPIAEIFEAMKMEYRET (SEQ . 


ID NO: 51), QDSIMPIFEAMKMSWHVN 


28 


(SEQ. ID N0:52), DGVLFPIPESMKMIRLET (SEQ. ID NO: 53), 


29 


VSRTMTNFEAMKMIYHDL (SEQ. 


ID NO: 54), DVLLPTVFEAMKMYITK (SEQ. ID 


* 30 


/NO: 55), PNDLERIFDAHKIVTVHS (SEQ. 


ID NO: 56), TRALLEIFDAQKMLYQHL 




(SEQ. ID N0:57), RDVHVGIPEAMKMyTVET (SEQ. ID NO:58), 


32 


GDKLTEIFEAMKIQITSG ( SEQ . 


ID NO: 59), LEGLRAVFESKKMELADE (SEQ. 


33 


ID NO: 60) , VADSHDTFAAMKHVWLDT 


(iSEQ. ID NO: 61) , 


34 


GLPLQDILESMKIVMTSG (SEQ. 


ID NO: 62), RVPLEAIFEGAKMIWVPNN (SEQ. 


35 


ID NO: 63), PMISHKNFEAMKHLFVPE 


(SEQ. ID NO: 64) , 


- w w 


VKLGLPAMFEAMKMEWHPS (SEQ; 


ID NO: 65), QPSLLSIFEANKHQASLM (SEQ. 


• J / • 


ID NO: 66) , LLELRSNFEAMKHEWQIS 


(SEQ. ID NO: 67) , 


• ^ Q 


DEElilQIPEMIKMYPLVHVTK (SEQ. ID NO: 68), SNLVSLLHSQKILWTDPQSFG 




( SEQ . ID NO : 7 0 ) , LFLHDPLNAQKVELYPVTSSG ( SEQ . ID NO : 7 1 ) , 


• An 


SDINAIiSTQKIYWAH (SEQ. ID N0:72) , MASSLRQILDSQKMEWRSNA6GS 




(SEQ. ID NO: 74) , MAHSLVPIFDAQKIEWRDPFGGS (SEQ. ID NO:75) , 




MGPbLVNIFEAQKIEWHPLTGGS 


(SEQ. 


ID 


N0:76), 




nU^SIJlSILEAiQKMEIJtNTPGG 


(SEQ. 


ID 


NO:77) , 




MAGGLNDIFEAQKIEraiEbTGGS 


(SEQ. 


ID 


NO:78), 




MSSYLAPIFEAQKIEWHSAY6GS 


CSEQ. 


ID 


NO: 79) , 


."■•46' •■■ 


MAKALQKILEAQIOIEWRSHPGGS 


(SEQ. 


ID 


NO: 80), 


47 


MAFQLCKIFYAQKMEWHGVGGGS 


CSEQ. 


ID 


NO: 81), 


48 


MAGSiiSTIFDAQKIEWHVGKGGS 


[SEQ. 


lb 


NO:82), 


49 


MAQQLPblFDAQKIEWRIAGGGS | 


[SEQ. 


ID 


NO:83), 


«/\/ 


MAQRLFHILDAQKIEWHGPKGGS ( 


[SEQ. 


ID 


N0:84), 


51 


MAGCLGPIFEAQKMEWRHFVGGS ( 


[SEQ. 


ID 


N0:85), 


52 


MAWSLKPIFDAQKIEWHSPGGGS ( 


[SEQ. 


ID 


NO:86) , 


53 


MALGLTRILDAQKIEWHRDSGGS ( 


[SEQ. 


ID 


NO:87) , 


54 


MAGSLRQILDAQKIEWKRPLGGS ( 


[SEQ. 


ID 


NO: 88) , and 


55 . 


HADRLAYILEAQKMEWHPHKGGS { 


[SEQ. 


ID 


NO:89). 



« • . . .... ■ ■• 

^ 8. A purified preparation of a biotinylated 

2 protein, said protein comprising a biotinylation peptide less 

3 than 50 amino acids in length. 
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.9. The piirified protein of Claim 8, wherein said 
biotinylation peptide is LX1X2IX3X4X5X6KX7X8X9X10 (SEQ. ID 
NO;l) , where 

X^ is any amino acid; 

Xj is any amino acid other than L, V, I, W, F, Y; 

X3 is F or L, X4 is E or D; 

X5 is A, G, S, or T; 

Xg is Q or M; 

X7 is I, M, or V; 

Xg is E, L, V, Y, or I; 

X9 is w, Y, V, F, L, or I; and 

X^o is R, H/ or any amino . acid other than D or E. 

10. The purified protein of Claim 8, wherein said 
biotinylation peptide is selected from the group consisting of 
LEEVDSTSSAIFDAMKMVWISPTEPR (SEQ. ID NO: 14), 
QGDSDETLPHILRAHKMEVYNPGGHEK (SEQ. ID N0:15), : 
SKCSYOTDLKIPEAQKMLVHSYLRVMYNY (SEQ. ID NO: 16), 
MASSDDGLLTIFDATKMMFIRT (SEQ. ID NO: 17), 
SYMDRTDVPTILEAMKMELHTTPWACR (iSEQ. ID NO: 18) ^ 
SFPPSLPDKNIFEAMKMYVIT (SEQ. ID NO: 19), 
SWPEP6WDGPPESMKMVYHSGAQS6Q (SEQ. ID N0:20), 
VRHLPPPLPALFDAMKMEFVTSVQF (SEQ. ID NO: 21) , 
DMTMPTGMTKIFEAMKMEVST (SEQ. ID NO: 22), 

ATAGPLHEPDIFLAMKMEWDVTNKAGQ (SEQ. ID NO: 23) , SMWETLNAQKTVLL 
(SEQ. IDNO:24), SHPSQLMTNDIFEGMKMLYH (SEQ. IDNp:25); 
TSELSKLDATIFAAMKMQWWNPG (SEQ. ID NO: 27), 

VMETGLDLRPILTGMKMDWIPK (SEQ. ID NO:28) , LHHILDAQKMVWNHR (SEQ. 
ID NO: 30), PQGIFEAQKMLWRS (SEQ. ID N0:31) , LAGTFEALKMAWHEH 
(SEQ. ID NO:32), LNAIFEAMKMEYSG (SEQ. ID NO:33), 
LGGIFEJ^MKMELRD (SEQ. ID NO:34) , LLRTFEAMKMDMRNG (SEQ. ID. 
NO: 35), LSTIMEGMKMYIQRS (SEQ. ID NO: 3 6), LSDIFEAMKMVYRPC (SEQ. 
ID NO: 37), LESMLEAMKMQWNPQ (SEQ. ID NO: 38) , LSDIFDAMKMVYRPQ 
(SEQ. ID NO:39), LAPFFESMKMVWREH (SEQ. ID N0:40) , 
LKGIFEAMKMEYTAM (SEQ. ID NO: 41), LEGIFEAMKMEYSNS (SEQ. ID 
NO: 42), LLQTFDAMKMEWLPK (SEQ. ID NO: 43), VFDILEAQKWTLRF (SEQ. 
ID NO: 44), LVSMFDGMKMEWKTL (SEQ. ID NO:45) , LEPIFEAMTOTOWRLE 
(SEQ. IDNO:46), LKEIFEGMKMEFVKP (SEQ. IDNO:47), 
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LGGILEAQKMLYRGN (SEQ. ID NO: 48) , RPVLENIFEAMKMEVWKP (SEQ. ID 
N0:50), RSPIAEIFEAMKMEYRET (SEQ. ID N0:51), QDSIMPIFEAMKMSWHVN 
(SEQ. IDN0:52), DGVLFPIFESMKMIRLET (SEQ. IDN0:53), 
VSRTMTNFEAMKMIYHDL (SEQ. ID N0:54) , DVLLPTVFEAMKMYITK (SEQ. ID 
N0:55), PNDLERIPDAMKIVTVHS (SEQ. ID N0:56), TRALLEIFDAQKMLYQHL 
(SEQ. ID N0:57), RDVHVGIFEAMKMYTVET (SEQ. ID NO:58), 
GDKLTEiFEAMKIQWTSG (SEQ. ID N0:59) , LEGLRAVFESMKMELADE (SEQ. 
ID N0:60), VADSHDTFAAMKMVWLDT (SEQ. ID NO: 61), 
GLPLQblLESMKIVMTSG (SEQ. ID NO: 62), RVPLEAIFEGAKMIWVPNN (SEQ. 
ID NO: 63), PMISHKNFEAMKMLPVPE (SEQ. ID NO: 64) , 
KLGLPMte-EAMKMEWHPS (SEQ. ID N0:65) , QPSLLSIPEAMKMQASLM (SEQ. 
ID NO:66); LLELRSNFEAMKMEWQIS (SEQ. ID N0:67) , 

DEELNQIFEAMKMyPLVHVTK (SEQ. ID NO: 68), SNLVSLLHSQKILMTDPQSFG 
(SEQ. ID NO: 70) , LFLHDFLNAQKVELYPVTSSG (SEQ. ID NO: 71) , 

SDINALLSTQKIYWAH (SEQ. ID NO:72) , MASSLRQILDSQKMEWRSNAGGS 
(SEQ. IDNO:74), MAHSLVPIFDAQKIEWRDPPGGS (SEQ. ID NO: 75) , 

MGPDLVNIFEAQKIEWHPLTGGS (SEQ. ID NO: 76), 

MAFSLRSILEAQKMELRNTP6GS (SEQ. ID NO: 77), 

MAGGLNDIFEAQKIEWHEDT66S (SEQ. ID NO: 78), 

MSSYLAPIFEAQKIEWHSAyGGS (SEQ. ID NO: 79), 

MAKALQKILEAQKMEWRSHPGGS (SEQ. ID NO: 80), 

MAFQLCKIFYAQKMEWHGVGGGS (SEQ. ID NO: 81), 

MAGSLSTIFDAQKIEWHVGKGGS (SEQ. ID NO: 82), 

MAQQLPDIFDAQKIEWRIAGGGS (SEQ. ID NO: 83), 

MAQRLFHILDAQKIEWHGPKGGS (SEQ. ID NO: 84) , 

MAGCLGPIFEAQKMEWRHFVGGS (SEQ. ID NO: 85), . 

MAWSLKPIFDAQKIEWHSPGGGS (SEQ. ID NO: 86), 

MALGLTRILDAQKIEWHRDSGGS (SEQ. ID NO: 87) , 

MAGSLRQILDAQKIEWRRPLGGS (SEQ. ID NO: 88), and 
MADRLAYILEAQKMEWHPHKGGS (SEQ. ID NO:89) . 

11. A kit for biotinylating a protein, the kit 
comprising a recombinant DNA expression polynucleotide that 
encodes a biotinylation peptide less than 50 amino acids which 
can be fused in frame with a protein by inserting the coding 
sequence for the protein. 
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12. The kit of Claim 11, wherein said biotinylation 
peptide is LXiX2lX3X4X5X6KX7X8X9Xio (SEQ. ID N0:1), where 

Xj is any amino acid; 

X2 is any amino acid other than L, V, I, W, F, Y; 

X3 is F or L, X4 is E or D; 

X5 is A, G, S, or T; 

Xg is Q or M; 

X7 is I, M, or V; 

Xg is E, L, V, Y, or I; 

X9 is W, Y, V, P, L, or I; and 

is R, H, or any aimino acid other than D or E. 

13. The kit of Claim 11, wherein said biotinylation 
peptide is selected from the group consisting of 
liEEVDSTSSAIFDAMKMVWISPTEFR (SEQ. XD NO:14), 
QGDROETLPMILRAMKHEVYNPGOIEK (SEQ. ID NO: 15) , 
SKCSYSHDLKIFEAQKMLVHSYLRVMYNY (SEQ. ID NO: 16) , 
MASSDDGLLTIFDATKMMFIRT (SEQ. ID NO: 17), 
SYMDRTDVPTILEAMKMELHTTPWACR (SEQ. ID NO: 18) , 
SFPPSLPDKNIFEAMKMYVIT (SEQ. ID NO: 19), 
SWPEPGWDGPFESMKMVYHSGAQSGQ (SEQ. ID NO: 20), 
VRHLPPPLPALFDAMKMEFVTSVQF (SEQ. ID NO: 21), 
DMTMPTGMTKIFEAMKMEVST (SEQ. ID NO: 22), 

ATAGPLHEPDIFLAMKMEWDVTNKAGQ (SEQ. ID NO: 23), SMWETLNAQKTVLL 
(SEQ. ID NO: 24), SHPSQLMTNDIFEGMKMLYH (SEQ. ID NO:25) , 
TSELSKLDATIFAAMKMQWWNPG (SEQ. ID NO: 27), 

VMETGLDLRPILTGMKMDWIPK (SEQ. ID N0:28) , LHHILDAQKMVWNHR (SEQ. 
ID NO: 30), PQGIFEAQKMLWRS (SEQ. ID NO: 31), LAGTFEALKMAWHEH 
(SEQ. ID NO:32), LNAIFEAMKMEYSG (SEQ, ID NO:33), 
LGGIFEAMKMELRD (SEQ. ID NO: 34), LLRTFEAMKMDWRNG (SEQ. ID 
NO: 35), LSTIMEGMKMYIQRS (SEQ. ID NO: 36), LSDIFEAMKMVYRPC (SEQ. 
ID NO: 37), LESMLEAMKMQWNPQ (SEQ. ID NO: 38) , LSDIFDAMKMVYRPQ 
(SEQ. ID NO: 39), LAPFFESMKMVWREH (SEQ. ID N0:40) , 
LKGIFEAMKMEYTAM (SEQ. ID NO: 41) , LEGIFEAMKMEYSNS (SEQ. ID 
NO: 42), LLQTFDAMKMEWLPK (SEQ. ID NO: 43), VFDILEAQKWTLRF (SEQ. 
IDN0:44), LVSMFDGMKMEWKTL (SEQ. IDNO:45), LEPIFEAMKMDWRLE 
(SEQ. ID N0:46), LKEIFEGMKMEFVKP (SEQ. ID N0:47) , 
LGGILEAQKMLYRGN (SEQ. ID NO: 48) , RPVLENIFEAMKMEVWKP (SEQ. ID 
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27 NO: 50), RSPIAEIFEAMKMEYRET (SEQ. ID NO: 51) , QDSIMPIPEAMKMSWHVN 

28 (SEQ. ID NO: 52) , DGVLFPIFESMKMIRLET (SEQ. IDNO:53), 

29 VSRTMTNPEAMKMIYHDL (SEQ. ID NO: 54), DVLLPTVFEAMKMYITK (SEQ. ID 

30 NO: 55), PNDLERIFDAMKIVTVHS (SEQ. ID NO: 56), TRALLEIFDAQKMLYQHL 

31 (SEQ. ID N0:57) , RDVHVGIFEAMKMYTVET (SEQ. ID NO: 58), 

32 . GDKLTEIFEAMKIQWTSG (SEQ. ID NO: 59), LEGLRAVFESMKMELADE (SEQ. 

33 ID NO: 60), VADSHDTPAAMKMVWLDT (SEQ. ID NO: 61), 

34 GLPLQDILESMKIVMTSG (SEQ. ID NO: 62) , RVPLEAIFEGAKMIWVPNN (SEQ. 

35 ID Np:63), PMISHKNFEAMKMLFVPE (SEQ. ID NO: 64), 

36 KLGLPAMFEAMKMEWHPS (SEQ. ID NO: 65), QPSLLSIPEAMKMQASLM (SEQ. 

37 ID NOr66) , LLELRSNFEAMKMEWQIS (SEQ. ID NO: 67), 

38 DEEIUQIFEAMKMYPLVHVTK (SEQ. ID NO: 68), SNLVSIJiHSQKILWTDPQSFG 

39 (SE(j. ID N0:70), LFLHDFLNAQKVELYPVTSS6 (SEQ. ID NO: 71), 

40 SDINALLSTQKIYMAH (SEQ. ID NO: 72), MASSLRQILDSQKMEWRSNAGGS 

41 (SEQ. ID N0:74) , MAHSLVPIFDAQKIEWRDPFGGS (SEQ. ID N0:75) , 

42 MGPDLVNIFEAQKIEWHPLTGGS (SEQ. ID NO: 76) 

43 MAFSLRSILEAQKMEIJOTrPGGS (SEQ. ID NO: 77) 

44 KA6GLNDIFEAQKIEWHEDTG6S (SEQ. ID Np:78) 

45 MSSYLAPIFEAQKIEWHSAYGGS (SEQ. ID NO: 79) 

46 : HAKALQKILEAQKMEWRSHPGGS (SEQ. ID NO: 80) 

47 MAFQLCKIPYAQKMEWHGVGGGS (SEQ. ID NO: 81) 

48 MAGSLSTIPDAQKIEraVGKGGS ( SEQ . ID NO : 82 ) 

49 MAQQLPDIFDAQKIEWRIAGiSGS (SEQ. ID NO:83) 
50^ MAQRLFHILDAQKiEWHGPKGiSS (SEQ. ID NO: 84) 

51 MAGCLGPIFEAQKMEWRHPVGGS (SEQ. ID NO: 85) 

52 MAWSLKPIFDXQKIEWHSPGGGS (SEQ. ID NO: 86) 

53 MALGLraiLDAQKIEWHRDSGGS (SEQ. ID NO: 87) 

54 MAGSLRQILDAQKIEWRRPLGGS (SEQ. ID NO: 88) , and 

55 MADRLAYILEAQKMEWHPHKGGS (SEQ. ID NO: 89) . 

^ 14. The kit of Claim 11 wherein the expression 

2 polynucleotide is transformed into a host cell. 

^ 15. Thfe kit of Claim 12 wherein the host cell 

2 oyerexpresses a biotin protein lipase. 



1 

"■2- 



16. A method for identifying a biotinylating 
enzyme, said method comprising: 
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(a) on a surface of a substrate, providing a fusion 
protein comprising a protein and a biotinylation 
peptide of less than 50 amino acids in length; 

(b) in a predefined region of the surface, 
contacting said fusion protein with an enzyme; and 

(c) determining whether the fusion protein has been 
biotinylated. 

17. The method of Claim 16, wherein said fusion 
protein is contacted with a plurality of different enzymes, 
wherein each of the different enzymes is in a different 
predefined region. 

18. The method of Claim 16, wherein the step of 
determining whether the fusion protein has been biotinylated 
further comprises the steps of; 

i) treating the fusion protein with labeled 
streptavidin, wherein the labeled streptavidin binds 
to biotinylated fusion protein; 

ii) washing the substrate to remove unbowd labeled 
streptavidin; and 

iii) detecting the presence of labeled streptavidin 
which has bound to biotinylated fusion protein. 

19. The method of Claim 16, wherein the substrate 
comprises a support having a pliirality of wells. 

20. The method of Claim 19, wherein the support 
having a plurality of wells is a 96-well microtiter plate. 

21. The method of Claim 16, wherein said 
biotinylation peptide is LX^Xj 1X3X4X5X6^7X3X9X10 (SEQ. ID 
NO:i), where 

Xj is any amino acid; 

Xj is any amino acid other than L, V, I, w, F, Y; 
X3 is P or L, X4 is E or D; 
X5 is A, G, s, or T; 
Xg is Q or M; 
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X7 is I, M, or V; 

Xq is L, V, or I; 

Xg is W, Y, V, F, L, or I; and 

is R, H, or any amino acid other than D or E. 

22. The method of Claim 16, wherein said 

biotihylation peptide is selected from the group consisting of 

LEEVDSTSSAIFDAMKMVWISPTEFR (SEQ. ID NO: 14), 

QGDSbETLPMILRAMKHEVYKPGGHEK (SEQ. ID NO: 15), 

SKCSYSHDLKIFEAQKMLVHSYLRVMYNY (SEQ. ID NO: 16), 
,;MASSDDG (SEQ. ID NO: 17), 

- iSYMDRTDVPTILEAMKMELHTTPWA^ (SEQ. ID NO: 18), 

SFPPSLPbKNIFEAMKMYVIT (SEQ. ID N0:19) , 

SWPEPGWDGPFESMKMVYHSGAQSGQ (SEQ. ID NO: 20), 

VRHLPPPLPALFDAMKMEFVTSVQF (SEQ. ID NO: 21), 

DMraPTGMTKIFEMIKMEVST (SEQ. ID NO: 22), 

ATA6PLHEPDIFIAMKMEWDVTNKA6Q (SEQ. ID NO: 23), SMWETLNAQKTVLL 
(SEiQ. ID NO:24) , SHPSQLMTNDIFEGMKMLYH (SEQ. ID NO: 25), 
. TSELSKLDATIFAAMiaiQWWNPG (SEQ. ID NO: 27), 
:V!ffiTGLDIJlPILTGMroroWIPK (SEQ. ID N0:28) , LHHILDAQKMVWNHR (SEQ. 
ID NO: 30), PQGIFEAQKMLWRS (SEQ. ID N0:31) , LAGTFEAIKMAWHEH 
(SEQ. ID NO: 32) , LNAIFEAMKMEYSG (SEQ. ID NO: 33), 
LGGIF^aMKMELRD (SEQ. ID NO: 34), LLRTFEAMKMDWRNG (SEQ. ID 
NO:35) , LSTDCEGMKMYIQRS (SEQ. ID NO:36) , LSDIFEAMKMVYRPC (SEQ. 
ID NO:37) , LESMLEAMKMQWNPQ (SEQ. ID NO: 38), LSDIFDAMKMVYRPQ 
(SEQ. ID irO:39) , LAPFPESMKMVWREH (SEQ. ID N0:40) , 
LKGIFEAMKMEYTAM (SEQ. ID NO: 41) , LEGIFEAMKMEYSNS (SEQ. ID 
N6:42) , LLQTFDAMKMEWLPK (SEQ. ID NO:43) , VFDILEAQKWTLRF (SEQ. 
ID NO: 44), LVSMFDGMKMEWKTL (SEQ. ID NO: 45), LEPIFEAMKMDWRLE 
(SEQ. ID N0:46) , LKEIFEGMKMEFVKP (SEQ. IDN0:47), 
LGGILEAQKMLYRGN (SEQ. ID NO: 48), RPVLENIFEAMKMEVWKP (SEQ. ID 
N0:50), RSPIAEIFEAMKMEYRET (SEQ. ID NO: 51), QDSIMPIFEAMKMSWHVN 
(SEQ. ID NO: 52) , DGVLFPIFESMKMIRLET (SEQ. ID NO: 53), 
VSRTMTNFEAMKMIYHDL (SEQ. ID N0:54) , DVLLPTVFEAMKMYITK (SEQ. ID 
NO:55) , PNDLERIFDAMKIVTVHS (SEQ. ID NO: 56) , TRALLEIFDAQKMLYQHL 
(SEQ. ID NO:57), RDVHVGIFEAMKMYTVET (SEQ. ID NO:58), 
GDKLTEIFEAMKIQWTSG (SEQ. ID NO: 59) , LEGLRAVFESMKMELADE (SEQ. 
ID NO: 60), VADSHDTFAAMKMVWLDT (SEQ. ID NO: 61), 
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34 


GLPLQDILESMKIVMTSG (SEQ. ID NO:62) , RVPLEAIFEGAKMIWVPNN (SEQ. 


35 


ID NO: 63), PMISHKNFEAMKMLFVPE 


(SEQ. ID NO: 64), 


36 


. KLGLPAMPEAMKMEWHPS (SEQ. ID NO: 65), QPSLLSIFEAMKMQASLM (SEQ. 


37 


ID NO: 66), LLELRSNFEAHKHEWQIS 


(SEQ. ID NO: 67) , 


38 


DEELNQIFEAMKMYPLVHVTK (SEQ. ID NO: 68), SNLVSLLHSQKILWTDPQSFG 


39 


(SEQ. ID NO:70), LFLHDFLNAQKVELYPVTSSG (SEQ. ID NO: 71) , 


40 


SDINALLSTQKIYWAH (SEQ. : 


ID NO: 72), MASSLRQILDSQKMEWRSNAGGS 


41 


(SEQ. ID N0:74), MAHSLVPIFDAQKIEWRDPFGGS (SEQ. ID NO: 75), 


42 


MvirUljVMX F£AQKX£WHPLTG6S 


(SEQ. 


ID N0:76), 


43 




(SEQ. 


ID NO: 77), 


44 


MAGGLNDIFEAQKIEWHEDTGGS 


(SEQ. 


ID NO:78), 


45 


HSSYLAPIFEAQKIEWHSAYGGS 


(SEQ. 


ID NO: 79) , 


46 : 


MAKALQKILEAQKHEWRSHPGGS 


(SEQ. 


ID NO: 80) , 


47 


MAFQLCKIFYAQKMEWHisVGGGS 


(SEQ. 


ID NO: 81), 


48 


HAGSLSTIFDAQKIEWHVGK6GS 


(SEQ. 


ID N0:82) , 


49 


KAQQLPDIFDAQKIEWRIAG6GS 


(SEQ. 


ID NO:83) , 


50 


MAQRLFHILDAQKIERHGPKGGS 


(SEQ. 


ID Np:84), 


51 


MAGCLGPIFEAQXMEWBHFVGGS 


(SEQ. 


ID N0:85), 


52 


IC^WSLKPIFpAQKIEWHSPGGGS 


(SEQ. 


ID N0:86), 


53 


MAL6LTRILDAQKIEWHRDSGGS 


(SEQ. 


ID NO: 87), 


54 


HAGSLRQILDAQKIEWRRPL66S 


(SEQ. 


ID NO: 88), and 


55 


MADRLAYILEAQKMERHPHKGGS 


(SEQ. 


ID NO: 89). 
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